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Many of the segment polarity proteins of Drosophila and other invertebrates 
are closely related to vertebrate proteins, implying that the molecular mechanisms 
involved are ancient. Among the vertebrate proteins related to the fly genes are En- 
1 and -2, which act in vertebrate brain development and WNT-1, which is also 
5 involved in brain development, but was first found as the oncogene implicated in 
many cases of mouse breast cancer. In flies, the patched gene is transcribed into 
RNA in a complex and dynamic pattern in embryos, including fine transverse stripes 
in each body segment primordium. The encoded protein is predicted to contain 
many transmembrane domains. It has no significant similarity to any other known 

10 protein. Other proteins having large numbers of transmembrane domains include a 
variety of membrane receptors, channels through membranes and transporters 
through membranes. 

The hedgehog (HH) protein of flies has been shown to have at least three 
vertebrate relatives: Sonic hedgehog (Shh); Indian hedgehog, and Desert hedgehog. 

IS The Shh is expressed in a group of cells at the posterior of each developing limb 
bud. This is exactly the same group of cells found to have an important role in 
signaling polarity to the developing limb. The signal appears to be graded, with 
cells close to the posterior source of the signal forming posterior digits and other 
limb structures and cells farther from the signal source forming more anterior 

20 structures. It has been known for many years that transplantation of the signaling 
cells, a region of the limb bud known as the "zone of polarizing activity (ZPA)* has 
dramatic effects on limb patterning. Implanting a second ZPA anterior to the limb 
bud causes a limb to develop with posterior features replacing the anterior ones (in 
essence little fingers instead of thumbs). Shh has been found to be the long sought 

25 ZPA signal. Cultured cells making Shh protein (SHH), when implanted into the 
anterior limb bud region, have the same effect as an implanted ZPA. This 
establishes that Shh is clearly a critical trigger of posterior limb development. 

The factor in the ZPA has been thought for some time to be related to 
another important developmental signal that polarizes the developing spinal cord. 

30 The notochord, a rod of mesoderm that runs along the dorsal side of early vertebrate 
embryos, is a signal source that polarizes the neural tube along the dorsal-ventral 
axis. The signal causes the part of the neural tube nearest to the notochord to form 



2 



W 96/11260 

PCT/US95/13213 

Z^T?*^"""***-*** ™e floor plate, in 
out signals to the more dorsal pans of the neural tube to fcrthl 

notochordwhentransplantedtobeadjacenttothereun.it.* 
5 moW»k «>mdcaii to tne neural tube, suggesting the ZPA 

3 makes the same signal as the notnrhnrH T „ . - 6 

& asinenotochord. In keeping with this view, Shh was found 

agnal from notochord lo floor olatejirf wo , 

«■ B ^^»J«unI l »o B ,ver t d^ hedgehog go^^ab,, 

err' " dn ": - ■ - — - 

ran adjacent cells, binds to the Wr-~™. • *otum 
ncfi™,., • „ B10,,,ePTC,ece I*". ™»vatesit, and thereby prevents 

20 ^^^^"^^be^eenncldHH 

Relevant T ,fffTnni.T 

Descriptions of patched, by itself or its role with hedgehog may be found in 
Hooper and Scott, Cell 59 751-765 flQftov m ^ 

(1989) (bath «f v . . ^ akan ° 61 Nalure ' 5 08-513 

uv»y; (both of which also describes th- * 

25 etal iw, ,™ ^ ^ foT Dro ^phila patched) Simcox 

^ a Development 107. 715-722 noso\. w -a , 

291-301 (1990V Phi.r , *° ^ Ingham ' ^opment, 110, 

301 (1990), Phillips etal., Development, 110, 105-114 (1990V *, m _ ' 

wu (1993); Krauss et al., Cell 75, 1431-1444 (1993V tm»*. 
and Romberg, Cell 76 89-102 naui » W (1993) ' Tabata 

no** ( ): Hee01skerk & DiNardo, Cell 76, 449-460 

(1994); Relink et al., Cell 76 76L775 noon » '°, w-4o0 

/o, 761-775 (1994); and a short review article by 



W 96/11260 



PCT/US95/13233 



Ingham, Current Biology 4, 347-350 (1994). The sequence for the Drosophila 5 ' 
non-coding region was reported to the GenBank, accession number M28418, 
referred to in Hooper and Scott (1989), supra. See also, Forbes, et al. t 
Development 1993 Supplement 115-124. 

5 

SUMMARY OF THE INVENTION 
Methods for isolating patched genes, particularly mammalian patched genes, 
including the mouse and human patched genes, as well as invertebrate patched genes 
and sequences, are provided. The methods include identification of patched genes 
10 from other species, as well as members of the same family of proteins. The subject 
genes provide methods for producing the patched protein, where the genes and 
proteins may be used as probes for research, diagnosis, binding of hedgehog protein 
for its isolation and purification, gene therapy, as well as other utilities. 

IS BRTFF nESraTPTION OF THE DRAWINGS 

Fig. 1 is a graph having a restriction map of about lOkbp of the 5 1 region 
upstream from the initiation codon of Drosophila patched gene and bar graphs of 
constructs of truncated portions of the 5' region joined to p-galactosidase, where the 
constructs are introduced into fly cell lines for the production of embryos. The 

20 expression of p-gal in the embryos is indicated in the right-hand table during early 
and late development of the embryo. The greater the number of + 's, the more 
intense the staining. 

DESCRIPTION OF THE SPECTFTC EMBODIMENTS 
25 Methods are provided for identifying members of the patched (ptc) gene 

family from invertebrate and vertebrate, e.g. mammalian, species, as well as the 
entire cDNA sequence of the mouse and human patched gene. Also, sequences for 
invertebrate patched genes are provided. The patched gene encodes a 
transmembrane protein having a large number of transmembrane sequences. 
30 In identifying the mouse and human patched genes, primers were employed 

to move through the evolutionary tree from the known Drosophila ptc sequence. 
Two primers are mployed from die Drosophila sequence with appropriate 
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restriction enzyme linkers to amplify portions of genomic DNA of a related 
invertebrate, such as m^Ttom^^^^^^^ 
not likely to diverge over volutionary time and are of low degeneracy 
^ nv ^v, the regions^ 
5 the first 1.5kb, usually within the first Ikb. of the coding portion of the cDNA 
convenientiy in the first hydrophilic loop of the protein. Employing the polymerase 
chain reaction (PGR) with the primers, a band can be obtained from mosquito 
genomicDNA. THe band may then be amplified and used in turn as a probe. One 
may use this probe to probe a cDNA library from an organism in a different branch 
10 of the evolutionary tree, such as a butterfly. By screening the library and 
identifying sequences which hybridize to the probe, a portion of the butterfly 
patched gene may be obtained. One or more of the resulting clones may then be 
used to rescreen the library to obtain an extended sequence, up to and including the 
enuxe coding region, as well as the non-coding 5'- and 3'-sequences As 
15 appropriate, one may sequence all or a portion of the resulting cDNA coding 
sequence. 

One may then screen a genomic or cDNA library of a species higher in the 
evolutionary scale with appropriate probes from one or both of the prior sequences 
Of particular interest is screening a genomic library, of a distantly related 

20 invertebrate, e.g. beetle, where one may use a combination of the sequences 
obtained from the previous two species, in this case, the Drosophila and the 
butterfly. By appropriate techniques, one may identify specific clones which bind to 
the probes, which may then be screened for cross hybridization with each of the 
probes individuaUy. The resulting fragments may then be amplified, e.g. by 

25 subcloning. 

By having aB orpam of fl,e4 diffaw,^,,^ m 
Husttirf «amp te . Drosophila (fly), nw&u,, ounertIy md 
co mpare ealK fa , cdbfam anuria* 
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can develop probes where at each site at least 2 of the sequences have the same 
nucleotide and where the site varies that each species has a unique nucleotide, 
inosine may be used, which binds to all 4 nucleotides. 

Either PGR may be employed using primers or, if desired, a genomic library 

5 from an appropriate source may be probed. With PCR, one may use a cDNA 
library or use reverse transcriptase-PCR (RT-PCR), where mRNA is available from 
the tissue. Usually, where fetal tissue is employed, one will employ tissue from the 
first or second trimester, preferably the latter half of the first trimester or the second 
trimester, depending upon the particular host. The age and source of tissue will 

10 depend to a significant degree on the ability to surgically isolate the tissue based on 
its size, the level of expression of patched in the cells of the tissue, the accessibility 
of the tissue, the number of cells expressing patched and the like. The amount of 
tissue available should be large enough so as to provide for a sufficient amount of 
mRNA to be usefully transcribed and amplified. With mouse tissue, limb bud of 

15 from about 10 to IS dpc (days post conception) may be employed. 

In the primers, the complementary binding sequence will usually be at least 
14 nucleotides, preferably at least about 17 nucleotides and usually not more than 
about 30 nucleotides. The primers may also include a restriction enzyme sequence 
for isolation and cloning. With RT-PCR, the mRNA may be enriched in accordance 

20 with known ways, reverse transcribed, followed by amplification with the 

appropriate primers. (Procedures employed for molecular cloning may be found in 
Molecular Cloning: A Laboratory Manual, Sambrook et ah, eds., Cold Spring 
Harbor Laboratories, Cold Spring Harbor, NY, 1988). Particularly, the primers may 
conveniently come from the N-terminal proximal sequence or other conserved 

25 region, such as those sequences where at least five amino acids are conserved out of 
eight amino acids in three of the four sequences. This is illustrated by the sequences 
(SEQ ID NO: 11) HTPLDCFWEG, (SEQ ID NO: 12) LTVGG, and (SEQ ID NO: 13) 
PFFWEQY. Resulting PCR products of expected size are subcloned and may be 
sequenced if desired. 

30 The cloned PCR fragment may then be used as a probe to screen a cDNA 

library of mammalian tissue cells expressing patched, where hybridizing clones may 
be isolated under appropriate conditions of stringency. Again, the cDNA library 
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should come from tissue which expresses patched, which tissue will come within the 
limitations previously described. Clones which hybridize may be subcloned and 
rescreened. The hybridizing subclones may then be isolated and sequenced or may 
be further analyzed by employing RNA blots and in situ hybridizations in whole and 
5 sectioned embryos. Conveniently, a fragment of from about 0.5 to lkbp of the N- 
terminal coding region may be employed for the Northern blot. 

The mammalian gene may be sequenced and as described above, conserved 
regions identified and used as primers for investigating other species. The N- 
terminal proximal region, the C-terminal region or an intermediate region may be 
10 employed for the sequences, where the sequences will be selected having minimum 
degeneracy and the desired level of conservation over the probe sequence. 

The DNA sequence encoding PTC may be cDNA or genomic DNA or 
fragment thereof, particularly complete exons from the genomic DNA, may be 
isolated as the sequence substantially free of wild-type sequence from the 
15 chromosome, may be a 50 kbp fragment or smaller fragment, may be joined to 
heterologous or foreign DNA, which may be a single nucleotide, oligonucleotide of 
up to 50 bp, which may be a restriction site or other identifying DNA for use as a 
primer, probe or the like, or a nucleic acid of greater than 50 bp, where the nucleic 
acid may be a portion of a cloning or expression vector, comprise the regulatory 
20 regions of an expression cassette, or the like. The DNA may be isolated, purified 
being substantially free of proteins and other nucleic acids, be in solution, or the 
like. 

The subject gene may be employed for producing all or portions of the 
patched protein. The subject gene or fragment thereof, generally a fragment of at 

25 least 12 bp, usually at least 18 bp, may be introduced into an appropriate vector for 
extrachromosomal maintenance or for integration into the host. Fragments will 
usually be immediately joined at the 5' and/or 3' terminus to a nucleotide or 
sequence not found in the natural or wild-type gene, or joined to a label other than a 
nucleic acid sequence. For expression, an expression cassette may be employed, 

30 providing for a transcriptional and translational initiation region, which may be 
inducible or constitutive, the coding region under the transcriptional control of the 
transcriptional initiation region, and a transcriptional and translational termination 
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region. Various transcriptional initiation regions may be employed which are 
functional in the expression host. The peptide may be expressed in prokaryotes or 
eukaryotes in accordance with conventional ways, depending upon the purpose for 
expression. For large production of the protein, a unicellular organism or cells of a 

5 higher organism, e.g. eukaryotes such as vertebrates, particularly mammals, may be 
used as the expression host, such as E. coli, B, sub ti lis, S. cerevisiae, and the like. 
In many situations, it may be desirable to express the patched gene in a mammalian 
host, whereby the patched gene will be transported to the cellular membrane for 
various studies. The protein has two parts which provide for a total of six 

10 transmembrane regions, with a total of six extracellular loops, three for each part. 
The character of the protein has similarity to a transporter protein. The protein has 
two conserved glycosylation signal triads. 

The subject nucleic acid sequences may be modified for a number of 
purposes, particularly where they will be used intracellularly, for example, by being 

15 joined to a nucleic acid cleaving agent, e.g. a chelated metal ion, such as iron or 
chromium for cleavage of the gene; as an antisense sequence; or the like. 
Modifications may include replacing oxygen of the phosphate esters with sulfur or 
nitrogen, replacing the phosphate with phosphoramide, etc. 

With the availability of the protein in large amounts by employing an 

20 expression host, the protein may be isolated and purified in accordance with 

conventional ways. A lysate may be prepared of the expression host and the lysate 
purified using HPLC, exclusion chromatography, gel electrophoresis, affinity 
chromatography, or other purification technique. The purified protein will generally 
be at least about 80% pure, preferably at least about 90% pure, and may be up to 

25 100% pure. By pure is intended free of other proteins, as well as cellular debris. 

The polypeptide may be used for the production of antibodies, where short 
fragments provide for antibodies specific for the particular polypeptide, whereas 
larger fragments or the entire gene allow for the production of antibodies over the 
surface of the polypeptide or protein, where the protein may be in its natural 

30 conformation. 

Antibodies may be prepared in accordance with conventional ways, where 
the expressed polypeptide or protein may be used as an immunogen,.by itself or 
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RNA is present in high amounts in the brain and lung, as well as in moderate 
counts in the kidney and liver. Weak signals are detected in heart, spleen, skeletal 

muscle and testes. 

in mouse embryos, ptc mRNA is present at 7 dpc, using in sim 
5 hybridization, ptc is present at high levels along the neural axis of 8.5 dpc embryos. 
By 1 1 5 dpc, ptc can be detected in developing lung buds and gut, consistent with « 
Northern profile. In addition, the gene is present at high levels in the ventricular 
zone of the central nervous system as well as in the zona limitans of the 
prosencephalon, ptc is also strongly transcribed in the perichondrium condensmg 

somites, a region which is prospective sclerotome and eventually forms bone » the 
vertebral column. PTC is present in a wide range of tissues from endodermal, 
mesodermal, as well as ectodermal origin, evidencing the fundamental role in many 
aspects of embryonic development, including the condensation of cartilage, the 
15 patterning of limbs, the differentiation of lung tissue, and the generation of neurons. 
The patched nucleic acid may be used for isolating the gene from vanous 
mammalian sources of interest, particularly primate, more particularly human, or 
from domestic animals, both pet and farm. e.g. lagomorpha, rodentiae, porcme, 
bovine, feline, canine, ovine, equine, etc. By using probes, particularly labeled 
20 probes of DNA sequences, of the patched gene, one may be able to isolate mRNA 
or genomic DNA, which may be then used for identifying mutations, pamcularly 
associated with genetic diseases, such as spina bifida, limb defects, lung defects, 
problems with tooth development, liver and kidney development, peripheral nervous 
system development, and other sites where ^patched gene is involved in regulation. 
25 Thesubjectprobescanalsobeusedforidentifymgmelevd 

associated with the testis to determine the relationship with the level of expression 

and sperm production. 

The gene or fragments thereof may be used as probes for identifying the 5 
non-coding region comprising the transcriptional initiation region, particularly the 
30 enhancer regulating the transcription of patched. By probing a genomic hbrary, 
particularly with a probe comprising the 5' coding region, one can obtain fragments 
comprisingme5'non<odingregion. if necessary, one may walk me fragment to 
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particles, e.g. magnetic particles, and the like. Specific binding molecules include 
pairs, such as biotin and streptavidin, digoxin and antidigoxin etc. F r the specific 
binding members, the complementary member would normally be labeled with a 
molecule which provides for detection, in accordance with known procedures. The 

5 assays may be used for detecting the presence of molecules which bind to the 

parched gene or PTC protein, in isolating molecules which bind to the patched gene, 
for measuring the amount of patched, either as the protein or the message, for 
identifying molecules which may serve as agonists or antagonists, or the like. 

Various formats may be used in the assays. For example, mammalian or 

10 invertebrate cells may be designed where the cells respond when an agonist binds to 
PTC in the membrane of the cell. An expression cassette may be introduced into 
the cell, where the transcriptional initiation region of patched is joined to a marker 
gene, such as p-galactosidase, for which a substrate forming a blue dye is available. 
A 1.5kb fragment that responds to PTC signaling has been identified and shown to 

15 regulate expression of a heterologous gene during embryonic development. When 
an agonist binds to the PTC protein, the cell will turn blue. By employing a 
competition between an agonist and a compound of interest, absence of blue color 
formation will indicate the presence of an antagonist. These assays are well known 
in the literature. Instead of cells, one may use the protein in a membrane 

20 environment and determine binding affinities of compounds. The PTC may be 
bound to a surface and a labeled ligand for PTC employed. A number of labels 
have been indicated previously. The candidate compound is added with the labeled 
ligand in an appropriate buffered medium to the surface bound PTC. After an 
incubation to ensure that binding has occurred, the surface may be washed free of 

25 any non-specifically bound components of the assay medium, particularly any non- 
specifically bound labeled ligand, and any label bound to the surface determined. 
Where the label is an enzyme, substrate producing a detectable product may be used. 
The label may be detected and measured. By using standards, the binding affinity of 
the candidate compound may be determined. 

30 The availability of the gene and the protein allows for investigation of the 

development of the fetus and the role patched and other molecules play in such 
development By employing antisense sequences of the patched gene, where the 
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affect growth and development in mammals, is also controlled by PTC. Since 
members of both the TGF-beta and Wnt families are expressed in mice in places 
close to overlapping with patched, the common regulation provides an opportunity 
in treating cancer. Also, for repair and regeneration, proliferation competent cells 
5 making PTC protein can find use to promote regeneration and healing for damaged 
tissue, which tissue may be regenerated by transfecting cells of damaged tissue with 
thep/c gene and its normal transcription initiation region or a modified transcription 
initiation region. For example, PTC may be useful to stimulate growth of new teeth 
by engineering cells of the gums or other tissues where PTC protein was during an 
10 earlier developmental stage or is expressed. 

Since Northern blot analysis indicates that pre is present at high levels in 
adult lung tissue, the regulation of ptc expression or binding to its natural ligand 
may serve to inhibit proliferation of cancerous lung cells. The availability of the 
gene encoding PTC and the expression of the gene allows for the development of 
15 agonists and antagonists. In addition, PTC is central to the ability of neurons to 
differentiate early in development. The availability of the gene allows for the 
introduction of PTC into host diseased tissue, stimulating the fetal program of 
division and/or differentiation. This could be done in conjunction with other genes 
which provide for the ligands which regulate PTC activity or by providing for 
20 agonists other than the natural ligand. 

The availability of the coding region for various ptc genes from various 
species, allows for the isolation of the 5' non-coding region comprising the promoter 
and enhancer associated with the pic genes, so as to provide transcriptional and post- 
transcriptional regulation of thep/c gene or other genes, which allow for regulation 
25 of genes in relation to the regulation of the ptc gene. Since the/tfc gene is 

autoregulated, activation of the ptc gene wUl result in activation of transcription of a 
gene under the transcriptional control of the transcriptional initiation region of the 
ptc gene. The transcriptional initiation region may be obtained from any host 
species and introduced into a heterologous host species, where such initiation region 
30 is functional to the desired degree in the foreign host. For example, a fragment of 
from about 1.5 kb upstream from the initiation codon, up to about lOkb, preferably 
up to about 5 kb may be used to provide for transcriptional initiation regulated by 
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the PTC protein, particularly ^Drosophila 5'-„on-coding region (GenBank 

accession no. M28418). 



^ The following examples are offered by illustration not by way of umitation . 

L PCR on Mnsniiito (Arnmheln rmnhi^ q nm \ r 

PCR primers were based on amino acid stretches of fly PTC that were not 
10 ^ytodivergeoverevolutioru^timeandwereoflowdegeneracy. Twosuch 
pnmers (P2R1 (SEQ ID N0.14): GfiACGAATECAARGTNCAYCARYTNTGG 
P4R1: (SEQ ID NO:15) fiGACfiAATrCCYTCCCARAARCANTC, (the 
underbned sequences are Eco RI linkers) amplified an appropriately sized band from 
mosquito genomic DNA using the PCR. The program conditions were as follows- 
!5 94 0 C4min. : 72°CAddTaq; 

[49 'C 30 sec.; 72 «C 90 sec.; 94 "C 15 sec] 3 times 
194 'C 15 sec.; 50 'C 30 sec.; 72 'C 90 sec] 35 times 
72 °C 10 rain; 4 8 C hold 

Tnis band was subcloned into the EcoRV site of pBluescript U and sequenced using 
20 the USB Sequence kit. 



n. 



Using the mosquito PCR product (SEQ ID NO:7) as a probe, a 3 day 
embryonic Precis coenia AgtlO cDNA library (generously provided by Sean 

25 Carroll) was screened. Filters were hybridized at 65 «C overnight in a solution 
containing 5xSSC, 10% dextran sulfate, 5x Denhardt's, 200„g/ml sonicated 
salmon sperm DNA, and 0.5% SDS. Filters were washed in 0.IX SSC, 0.1 % SDS 
at room temperature several times to remove nonspecific hybridization. Of the 
^Plaquesinitia^ 

30 ^choonespondedtomeNter^usofbutterfly^ 

library filters were rescieened and 3 additional clones (L5, L7, L8) were isolated 
wmch encompassed the remainder of the/,* coding sequence. Tne full length 
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sequence of butterfly pic (SEQ ID NO:3) was determined by ABI automated 
sequencing. 

m . <grr*>r, nf a Tr ihnVwm fltfgtto fiwinmic T ih^ry with Mosmiitn PCR Product 
5 pn t 1 000 hp F rapm<>nt frrwn the Butterfly Plnne 

A Ageml 1 genomic library from Tribolium casteneum (gift of Rob Dennell) 
was probed with a mixture of the mosquito PCR (SEQ ID NO:7) product and 
BstXI/EcoRI fragment of L2. Filters were hybridized at 55 °C overnight and 
washed as above. Of the 75,000 plaques screened, 14 clones were identified and the 
10 Sad fragment of T8 (SEQ ID NO: 1), which crosshybridized with the mosquito and 
butterfly probes, was subdoned into pBluescript. 

IV. PPR nr. Mntis* rDNA Urinf nppmwate Primers Derivfd from RffPiOIlS 
f nn«a»rved in the Four Inra t Hnmnlogues 
15 Two degenerate PCR primers (P4REV: (SEQ ID NO: 1 6) 

fifiArr.AATTC YTNGANTGYTTVTGGGA; P22: (SEQ ID NO: 17) 
rATArrAfirrAAfiriTGT aGGCCARTGCA'n were designed based on a 
comparison of PTC amino acid sequences from fly {Drosophila melanogaster) (SEQ 
ID NO:6), mosquito (Anopheles gambiae)(SBQ ID NO:8), butterfly (Precis 
20 coenia)(SEQ ID NO:4), and beetle (Tribolium casteneum)(SEQ ID NO:2). I 
represents inosine, which can form base pairs with all four nucleotides. P22 was 
used to reverse transcribe RNA from 12.5 dpc mouse limb bud (gift from David 
Kingsley) for 90 min at 37 °C. PCR using P4REV(SEQ ID NO: 17) and P22(SEQ 
ID NO: 18) was then performed on 1 p\ of the resultant cDNA under the following 
25 conditions: 

94 °C 4 min.; 72 °C Add Taq; 
[94 °C 15 sec.; 50 °C 30 sec.; 72 °C 90 sec.] 35 times 
72 °C 10 min.; 4 'C hold 
PCR products of the expected size were subcloned into the TA vector (Invitrogen) 
30 and sequenced with the Sequenase Version 2.0 DNA Sequencing Kit (U.S.B.). 
Using the cloned mouse PCR fragment as a probe, 300,000 plaques of a 
mouse 8.5 dpc AgtlO cDNA library (a gift from Brigid Hogan) were screened at 
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65«C as above and washed in 2x SSC, 0.1% SDS at room temperature. 7 clones 
were isolated, and three (M2 M4, and M8) were subcloned into pBluescript II 
200,000 plaques of this library were rescreened using first, a 1.1 kb EcoRI fragment 
from M2 to identify 6 clones (M9-M16) and secondly a mixed probe containing the 
most N terminal (Xhol fragment from M2) and most C terminal sequences 
(BamH^gm fragment from M9) to isolate 5 clones (M17-M21). M9, M10, M14 
and M17-21 were subcloned into the EcoRI site of pBluescript n (Strategene) 



V. 

10 Northerns: 

A mouse embryonic Northern blot and an adult multiple tissue Northern blot 
(obtamed from Clontech) were probed with a 900 bp EcoRI fragment from an N 
terminal coding region of mousey. Hybridization was performed at 65 »C in 5x 
SSPE, lOx Denhardt's, 100 M g/ml sonicated salmon sperm DNA, and 2% SDS 
15 After several short room temperature washes in 2x SSC, 0.05% SDS, the blots were 
washed at high stringency in 0. IX SSC, 0. 1 % SDS at 50C. 
In situ hybridization of sections: 

7.75, 8.5, 11.5, and 13.5 dpc mouse embryos were dissected in PBS and 
frozen in Tissue-Tek medium at -80 »C. 12-16 m frozen sections were cut 
20 collected onto VectaBond (Vector laboratories) coated slides, and dried for 30^0 
nunutes at room temperature. Af toa i0mmuterlxationm4%paraforn^dehydein 
PBS, the slides were washed 3 times for 3 minutes in PBS, acetylated for 10 minutes 
«n 0.25% acetic anhydride in triethanolamine, and washed three more times for 5 
minutes in PBS. Prehybridization (50% formamide, 5X SSC, 250 „g/ml yeast 
25 ^.»0^l«^^^^^ &D8 ^^ w ^ 

out for 6 hours at room temperature in 50% formamide/5x SSC humidified 
chambers. The probe, which consisted of 1 kb from the N-terminus of pre was 
added at a concentration of 200-1000 ng/ml into the same solution used for 
prehybridization, and then denatured for five minutes at 80 «C. Approximately 75 
30 of probe were added to each slide and covered with Parafilm. The slides were 
incubated overnight at 65 «C in the same humidified chamber used previously Tne 
following day, the probe was washed successively in 5X SSC (5 minutes 65 »C) 
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0.2X SSC (1 hour, 65 °Q, and 0.2X SSC (10 minutes, room temperature). After 
five minutes in buffer Bl (0.1M maleic acid, 0.15 M NaCl, pH 7.5), the slides were 
blocked for 1 hour at room temperature in 1 % blocking reagent (Boerhinger- 
Mannheim) in buffer Bl, and then incubated for 4 hours in buffer Bl containing the 
5 DIG-AP conjugated antibody (Boerhinger-Mannheim) at a 1:5000 dilution. Excess 
antibody was removed during two 15 minute washes in buffer Bl, followed by five 
minutes in buffer B3 (100 mM Tris, lOOmM NaCl, 5mM MgCl 3 , pH 9.5). The 
antibody was detected by adding an alkaline phosphatase substrate (350 pi 75 mg/ml 
X-phosphate in DMF, 450 ftl 50 mg/ml NBT in 70% DMF in 100 mis of buffer B3) 
10 and allowing the reaction to proceed over-night in the dark. After a brief rinse in 10 
mM Tris, ImM EDTA, pH 8.0, the slides were mounted with Aquamount (Lerner 
Laboratories). 

VI. Drosophila 5-transcriprinnal initiation region p-gal constructs. 

15 A series of constructs were designed that link different regions of the ptc 

promoter from Drosophila to a LacZ reporter gene in order to study the cis 
regulation of the pre expression pattern. See Fig. 1. A 10.8kb BamHI/BspMl 
fragment comprising the 5'-non-coding region of the mRNA at its 3'-terminus was 
obtained and truncated by restriction enzyme digestion as shown in Fig. 1. These 

20 expression cassettes were introduced into Drosophila lines using a P-element vector 
(Thummel et al., Gene 74, 445-456 (1988), which were injected into embryos, 
providing flies which could be grown to produce embryos. (See Spradling and 
Rubin, Science (1982) 218, 341-347 for a description of the procedure.) The vector 
used a pUC8 background into which was introduced the white gene to provide for 

25 yellow eyes, portions of the P-element for integrtion, and the constructs were 
inserted into a polylinker upstream from the LacZ gene. The resulting embryos 
were stained using antibodies to LacZ protein conjugated to HRP and the embryos 
developed with OPD dye to identify the expression of the LacZ gene. The staining 
pattern is described in Fig. 1, indicating whether there was staining during the early 

30 and late development of the embryo. 

VII. Isolation of a Mouse ptr Pens 
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mosn ^° m ° ,08UeS ° f fly PTC ^ SE Q 10 NO:6) were isolated from three insects- 
mosquito, butterfly and beetle, using either PCR «r m 
PCRori m «»« • ^^rcRorlowstnngencyUbrary screens. 

^ZL ^ arano ' eiaa,TOte5 frrc - 

JVC from mosquito genomic DNA that corresponded to the first hydrophilic loop of 

^^™°M'ofTa e m«n,wa SUS cd toSOTraabuttertyAGT10cDNA 
Wrary. Of 100,000 plaques screened. fi» e overlapping clones were isolated and 

n>NO:4),s 1.311 amino acids long and overall has 50* w , 
• .. . 6 «"•« "vcrau nas au% ammo acid identity (71% 

^s^aoossmecodingse,^. Tne n^souito FO, done (SEQ ro 
NO.T) and a conceding fragment of butterfly cDNA were used to screen a beeUe 

15 Agemll genomic library. Of ttepiaoues screened u,, . 

, F«4ues screened, 14 clones were identified A 

Z^Z" U% ^ ,,%i ^>^ -spending regions of fl, L 
butterfly PTC respectively. 

20 o fl »^ m ^ ,rftefWi ^ h0m0, ^ ta ^^Mrophilic.o<» 

tad*^. Tnesepnmerswereusedtoisolateme 
mouse homologue usini RT tra _ .. . 

gue using RT-PCR „„ embryonic limb bud UNA. An appropriately 

fragmen* „f mousey cDNA , , moose ^ ^ *• 

From about 300,000 plaoues, n done, wen, identified and of Z, 7 f»T 
gapping ONA, wnicn eompr* „ of fc 
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b both the embryonic and adult Northern blots, .her* probe detects a single 
8Kb message. Further exposure does not reveal an, additional minor bands. 
Devel pmentaU,. mRNA is present in low levels as early as 7 dpc and becomes 
ouheabundantbyllandlSdnc. While the gene is soil present at 17 dpc, the 
5 Northern blot indicates a clear decrease in the amount of message a. this sage. In 
the adult, ptc RNA is present in high amounts in the brain and lung, as well as in 
„,„d«ate amounts in the kidney and liver. Weak signals aredetected in heart, 
spleen, skeletal muscle, and testes. 

Nor»ema«a.ysisindicatestha.,»c mRNA is present at 7 dpc, while there is 
no daecoble signal in spoons from 7.75 dpc embryos. This discrepancy ts 
explained by the low level of transcription. In contrast, p,c is present at lugh levels 
the neural axis of 8.5 dpc embryo. By 11.5 dpc, p,c can be detected in the 
,5 developing lung buds and gut, consistent with its adui. Northern profile. In 
addition, the gene is present a. high levCs In the ventricuUr rone of the central 
nervous system, as well as in the zona hmitans of the prosencephalon. ,<c .s also 
wrongly transcribed in the condensing cartilage of 1 1.5 and 13.5 dpc limb buds, as 
wen as in the ventral portion of the somites, a region which is prospective 
20 sclerotonK and eventual, fon^lt^ to ^ 

wide range of tissues from ectodermal, mesodermal and ectodermal ongm 
supporting its fundamental role in embryonic development. 

V, n . nnli l ti f "" fime 

j5 to isolate humane (hptc). 2 x itf plaques from a human lung cDNA Ubrary 
(HU022a, Clonetech) were screened with a Ikbp mouse,* fragment, M2-2. 
Fiiters were hybridized overnight at reduced aringerc, (60 X in 5X SSC. 10% 
dextran sulfate. 5X Denhardfs. 0.2 mg/ml sonicated salmon sperm DNA, and 0.5* 
SDS). Two positive plaques (HI and H2) were isolated, the inserts cloned ..to 
30 pBluescript. and upon s^uencmg. bod, c^Bined sequence highly simiUr»m« 
^septchomolog. To isolate the 5" end. a. additional 6 x !(>> plaques were 

in duplicate with M2-3 EcoR I and M2-3 Xho I (containing 5' untranslated 
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sequence of mousey) probes. Ten plaques were purified and of these, 6 inserts 
were subloned into pBluescript. T obtain the full coding sequence, H2 was fully 
and H14, H20, and H21 were partially sequenced. The S.lkbp of human pre 
sequence (SEQ ID NO: 18) contains an open reading frame of 1447 amino acids 
(SEQ ID NO:19) that is 9656 identical and 98% similar to mouse,*. The 5' and 3' 
untranslated sequences of human/,* (SEQ ID NO: 18) are also highly similar to 
mousey (SEQ ID NO:09) suggesting conserved regulatory sequence. 

K - Comparison nfMoiiy Hum™ Fiv ™a R„ nrrflv n m tr n rn 

The deduced mouse PTC protein sequence (SEQ ID NO: 10) has about 38% 
identical amino acids to fly PTC over about 1,200 amino acids. This amount of 
conservation is dispersed through much of the protein excepting the C-terminal 
region. The mouse protein also has a 50 amino acid insert relative to the fly 
protein. Based on the sequence conservation of PTC and the functional conservation 
15 of hedgehog between fly and mouse, one concludes that,* functions similarly in 
the two organisms. A comparison of the amino acid sequences of mouse (mptc) 
(SEQ ID NO.I0), human (hptc) (SEQ ID NO:19), butterfly (bptc)(SEQ ID NO-4) 
and drosophila (ptc) (SEQ ID NO:6) is shown in Table 1. 

TART.F 1 

20 

alignment of human, mouse, fly, and butterfly PTC homologs 

alignment of human, nuuse, fl y, and butter£ly he s 

25 

WC „ DRDSWRVPDTO^I^f^^^^fHRA-APDRDyLHRPSYCDA 
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IJ>SAIiOASRVHVYMTORQWKLEHLCyKSGELlTET- GYMDQIIEYLY J^E^pIS? 
LEVLVT^^AVKVHI*yDTEWGLRDMCNMPSTPS FEGI YYIEQILRHLI PCS I *^£J*JJJ;^E 

* *. * * .* * .-* * *. • ** ** 

GAKLQSGTAYLLGKPPLR----WTNFDPLEFLEELK KIWQWSOTO^KKAEV 

GAKLQSGTAYLLGKPPLR WTNFDPLEFIXELK M ^ Q ^^^^ 

GSK^GPDYPIYVPHLKHKI^HLNPLEVVEEVK-KL KFQFPLSTIEAYMKRAGI 

* . • * ** ..* -* * 

GHGYMDRPCIJtfPADPDCPATAPNKNSTKPLDMALVL 

GSGYMEKPCLNPLNPNCPDTAPhnOJSTQPPDV^ 
TSAYMl^PC^ 

# ** .***.* .*.** ♦*♦**.* *.- *.***. *** 

• • • • • • 

HOSVAONSTOK VLS FTTTT LDD I LKS FS DVSVI RVAS GY LLMLAYAC LTMLRW-D C 

SotSSw VL P FTTTT LDD I LKS FS DV S VI RVAS GY LLMLAY ACLTMLRW- D C 

RKI^SGSVSSAYSFYPFSTSTI^ILGKFSEVSLKMIII^Mn«.IYVAVTLIQWRDP 

* ♦ *** ** * . * • * * • • • 



SKSQGJWGLACT/IJjVAI^VAAGLGLCSLIGISFNAATTQVLPFIJU^GVGVDDV 

S^QGAVGIAWI^VAI£VAAGLGU:SIJGISF1^TTQVL^ 

VRGQSSVGVAGVLI^CCFSTAAGLGI^AliGIVFNAASTQWPFIJU^GIX^ 

IRSQAGVGIWS^LI^SITVAAGLGFCALLGIPFNASSTQIVPFLALGLGVQDM^ 

* •* ••*•*. .*.** •**..**..******.**•♦•*•• •• 

^S~--£SklilSgpsilfsacstagsf 

VEQAGD- - VPREERTGLVLKKS GLSVLLAS LCNVMAFLAAALLP I PAFRV FCLQAAI LLL 
* ..* **. * * *• **..*.**•• * 

FNFAMVLLIFPAI LSMDLYRREDRRLDI FCCFTSPCVSRVTQVEPQMf TDTHDNTRYS PP 

SNLAAALLVFPAMIS1J3LRRRTAGRADIFCCCF-PVWKEQPKWPPVLPLNNNNGR 

FKLGSILLVFPAMISLDLRRRSAARADLLCCLM-P ESP LPKKKIPER 

**.***..*.** ** * *••** * 
PPYSSHSrAHETOITOQSTVQLRTEYOPHTHVYYITAEPBSEISVQPVTVTODT 
^S^ETHI^S^^ 

AKTRKNDKTHRID-TTRQPLDPDVS 

* • • 

ESTSSTRDLLSQFSDSSLHCIjEPPCTKWTI>SSFAEKHYAPFIJ/KPKAKV^ 
ESTSSTOLMQFSDSSUJCIXPPCTKOTMSFAEKOTAPFLLK^ 

nxpGSS - HSLASF SLATFAFQHYTPFLMRSWVKFLTVMGFLAALI 

ENVTKT CCL-SV SLTKWAKNQYAPFIMRPAVKVTSMLALIAVIL 

* # . « . «..•.**.... * 

VSLYGTTRVROGlJJtTDlVPllETREYDFIAAQFKYFSrrNMYIVTQWl-DYPNIQHLLYD 
VSLYGTTOVRDGLDLTDIVPRCTREYDFIAAQFKYFSFYNMYIV^ 
SSLYA^RX^ClSlIDLVPKDSNEHKFLDAQTRLFGFYSMYAVTQGNFE 
TSWC»TK^GU)LTDIVPEOTDEHEFI^RQEKYFGFYWAVTQGNFEYPTNQKLLYX 

.. * *• * ♦ * 
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The identity of ten other clones recovered from the mouse library is not 
determined. These cDNAs cross-hybridize with mouse pre sequence, while differing 
as to their restriction maps. These genes encode a family of proteins related to the 
5 patched protein. Alignment of the human and mouse nucleotide sequences, which 
includes coding and noncoding sequence, reveals 89% identity. 

In accordance with the subject invention, mammalian patched genes, including 
the mouse and human genes, are provided which allow for high level production of 

10 the patched protein, which can serve many purposes. The patched protein may be 
used in a screening for agonists and antagonists, for isolation of its ligand, 
particularly hedgehog, more particularly Sonic hedgehog, and for assaying for the 
transcription of the mRNA pic. The protein or fragments thereof may be used to 
produce antibodies specific for the protein or specific epitopes of the protein. In 

15 addition, the gene may be employed for investigating embryonic development, by 
screening fetal tissue, preparing transgenic animals to serve as models, and the like. 

All publications and patent applications cited in this specification are herein 
incorporated by reference as if each individual publication or patent application were 
20 specifically and individually indicated to be incorporated by reference. 

Although the foregoing invention has been described in some detail by way of 
illustration and example for purposes of clarity of understanding, it will be readily 
apparent to those of ordinary skill in the art in light of the teachings of this invention 
25 that certain changes and modifications may be made thereto without departing from 
the spirit or scope of the appended claims. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION : 

^UN^ST THEBOA * DOF TRUSIEKS OK THE STANFORD JUNIOR 

(ii) TITLE or INVENTION x Patched Gene. «d their D.. 
(ill) NUMBER OF SEQUENCES! 19 
(iv) CORRESPONDENCE ADDRESS t 

5) ISETSLTSE: H T aCh ' TMt ' * lb ^"on * Herbert 

sj cix?f T s .„ Fo r^:;r ero center - suite 3400 

(D) STATE i CA 

(E) COUNTRY: US 
(P) MP: 94111 



(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Fate ntln Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: PCT/US95/ 

(B) FILING DATE: 06-OCT-1995 

(C) CLASSIFICATION: 

<viii) ATTORNEY/ AGENT INFORMATION : 

(A) NAME: Rowland, Bertram I 

(B) REGISTRATION NUMBER: 20015 

(C) REFERENCE /DOCKET NUMBER: a60190-l 

<ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 415-781-1989 

(B) TELEFAX: 415-398-3249 



(2) INFORMATION FOR SBQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 736 base pairs 

(B) TJfPB: nucleic acid 

(C) STRANDBDNESS: Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SBQ ID NO:l: 
AACNNCNNTN NATGGCACCC CCNCCCAACC TTTNNNCCNN NTAANCAAAA NNCCCCNTTT 
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HATACCCCCT NTAANANTTT TCCACCNNNC NNAAANNCCN CTCNANACNA NGNAAANCCN 120 

TTTTTNAACC CCCCCCACCC GGAATTCCNA NTNNCCNCCC CCAAATTACA ACTCCAGNCC 180 

AAAATTNANA MAATTGOTCC TAACCTAACC NATNGTTGTT ACGGTTTCCC CCCCCAAATA 240 

CATGCACTCG CCCCAACACT TGATOCTTCC CCTTCCAATA AGAATAAATC TCCTCATATT 300 

AAACAAGCCN AAACCTTTAC AAACTCTTCT ACAATTAATC GGCGAACACG AACTGTTCGA 360 

ATTCTGGTCT GGACATTACA AAGTGCACCA CATCGCATGG AACCAGGAGA AGCCCACAAC 420 

CGTACTGAAC GCCTGGCAGA AGAAGTTCGC ACAGCTTGGT GGTTGGCGCA AGGAGTAGAG 480 

TGAATGGTGG TAATTTTTGG TTGTTCCAGG AGGTGGATCG TCTGACGAAG AGCAAGAAGT 540 

CGTCGAATTA CATCTTCGTG ACGTTCTCCA CCGCCAATTT GAACAAGATG TTGAAGGAGG 600 

CGTCGAANAC GGACGTGGTG AAGCTGGGGG TGGTGCTGGG GGTGGCGGCG GTGTACGGGT 660 

GGGTGGCCCA GTCGGGGCTG GCTGCCTTGG GAGTGCTGGT CTTNGCCNCC TNCHATTCCC 720 

CCTATAGTNA GNCGTA 736 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS t 

(A) LENGTH: 107 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Xaa Pro Pro Pro Aen Tyr Asn Ser Xaa Pro Lys Xaa Xaa Xaa Leu Val 
15 10 15 

Leu Thr Pro Xaa Val Val Thr Val Ser Pro Pro Lys Tyr Met His Trp 
20 25 30 

Pro Glu His Leu He Val Ala Val Pro He Arg He Asn Leu Val He 
35 40 45 

Leu Asn Lys Pro Lys Ala Leu Gin Thr Val Val Gin Leu Met Gly Glu 
50 55 60 

His Glu Leu Phe Glu Phe Trp Ser Gly His Tyr Lys Val His His He 
65 70 75 80 

Gly Trp Asn Gin Glu Lys Ala Thr Thr Val Leu Asn Ala Trp Gin Lys 
85 90 95 

Lys Phe Ala Gin Val Gly Gly Trp Arg Lys lu 
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100 

105 

(2) INFORMATI N FOR SBO ID NO: 3, 

U) SEQUENCE CHARACTERISTICS: 

2! ™ ra ' 5187 P»irs 
(8) Type i nucleic acid 

(C) STRANDEDHESS: «inal« 

(D) TOPOLOGY, lin««r 



PCT/US9S/1J233 



(il) MOLECULE TYPE, cDNA 

(Xi) SEQUENCE DESCRIPTION I SEQ ID NO,3, 
-XCTOTC CCCOCACCCO GAOTCCCCGC 

CCAOGCOOGC CCGOAGCCCO OGCCGCCGOC GGCAACATC* CCTCCCCTCO TAACGCCGCC 
CCGGCCCTCG GCAOOCACOC COOCOGCOOO AGOCGCACAC CCACCGCGCG ACCCCACCGC 
CCCOCGCCGO ACCOGGACTA TCTGCACCCG CCCACCTACT OCGACCCCGC CTTCCCTCTC 
CAGCAGATTT CCAAGGGOAA GGCTAC^GC CGOAAAGCOC CCCTCTCCCT OAOAOCGAAG 
-CAGAOAC TCTTATTTAA ACTGGGTTGT TACATTCAAA AGAACTCCCO CAAGTTTTTG 
CXTGTGGGTC TCCTCATATT TGGOOCCTTC CCTGTGGGAT TAAACOCAGC TAATCTCCAC 
ACCAACGTCO AGOAGCTGTO GOTGOAAGTT GOTOOACOAO TOAOTCOAGA ATTAAATTAT 
-CCCTCAOA ACATAGGAGA AGAGGCTATC TTTAATCCTC AACTCATGAT ACAGACTCCA 
AAAGAAGAAG CCGCTAATGT TCTGACCACA GAGGCTCTCC TCCAACACCT GGACTCACCA 
CCCAGGCCA GTCCTGTCCA CGTCTACATG TATAACAGGC AATGGAAGTT CCAACATTTG 
TCCTACAAAT CAGGCGAACT TATCACGCAG ACAGCTTACA TGGATCACAT AATAGAATAC 
CTTTAOCCTT GCTTAATCAT TACACCTTTG GACTCCTTCT GGGAAGGGGC AAAGCTACAG 
TCCGGGACAC CATACCTCCT AGGTAACOCT CCTTTACGGT GGACAAACTT TGACCCCTTG 
GAATTCCTAG AAGACTTAAA GAAAATAAAC TACCAAGTGG ACACCTCCGA GGAAATGCTG 
AATAAACCCG AAGTTGCOCA TGGGTACATG GACCCGCCTT GCCTCAACCC AGCCGACCCA 
GATTGCCCTG CCACAGCCCC TAACAAAAAT TCAACCAAAC CTCTTGATCT GGCCCTTGTT 

TTGAATCGTG CATCTCAAGG TTTATCtace 

TTTATCCAGC AAOTATATCC ATTGCCAGGA GGAGTTGATT 

GTGGGTGGTA CCGTCAACAA TGCCACTCCA »m»™ 

CTGCA ***CTTCTCA CCGCTCACCC CCTCCAAACC 

" CITCCMI ""— «—« c«™ 
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TACGTGGAGC TGGTTCATCA AAGTCTCCCC CCAAACTCCA CTCAAAAGOT CCTTCCCTTC 1320 

ACAACCACGA CCCTGCACOA CATCCTAAAA TCCTTCTCTG ATGTCACTGT CATCOGACTG 1380 

GCCACCCCCT ACCTACTCAT GCTTGCCTAT CCTGTTTAA CCATCCTGCG CTGGGACTGC 1440 

TCCAAGTCCC AGGCTGCCGT CCGCCTGGCT CCCGTCCTGT TGGTTGCGCT GTCAGTGGCT 1500 

GCAGGATTGG GCCTCTGCTC CTTOATTGCC ATTTCTTTTA ATGCTGOGAC AACTCAGGTT 1560 

TTGCCGTTTC TTGCTCTTGG TGTTGGTGTG GATGATGTCT TCCTCCTGGC CCATGCATTC 1620 

AGTGAAACAC CACAGAATAA GAGGATTCCA TTTGAGCACA GGACTGGGGA GTGCCTCAAG 1680 

CGCACCGGAG CCAGCGTGGC CCTCACCTCC ATCACCAATG TCACCCCCTT CTTCATGGCC 1740 

GCATTGATCC CTATCCCTGC CCTGCCAGOG TTCTCCCTCC AGGCTCCTGT GGTGGTGGTA 1800 

TTCAATTTTG CTATGGTTCT CCTC A TTTTT CCTGCAATTC TCAGCATGCA TTTATACAGA I860 

CGTGAGGACA GAAGATTGGA TATTTTCTGC TGTTTCACAA GCCCCTGTGT CAGCAGGGTG 1920 

ATTCAAGTTG AGCCACAGGC CTACACAGAG CCTCACACTA ACACCCGGTA CAGCCCCCCA 1980 

CCCCCATACA CCAGCCACAG CTTCGCCCAC GAAACCCATA TCACTATGCA GTCCACCGTT 2040 

CAGCTCCGCA CACAGTATGA CCCTCACACG CACGTGTACT ACACCACCGC OGAGCCACGC 2100 

TCTGAGATCT CTGTACAGCC TGTTACCGTC ACCCAGGACA ACCTCAGCTG TCAGAGTCCC 2160 

CAGAGCACCA GCTCTACCAG GGACCTGCTC TCCCAGTTCT CAGACTCCAG CCTCCACTGC 2220 

CTCGAGCCCC CCTGCACCAA GTGGACACTC TCTTCGTTTC CAGACAAGCA CTATCCTCCT 2280 

TTCCTCCTGA AACCCAAAGC CAAGGTTGTG GTAATCCTTC TTTTCCTGGG CTTGCTGGCG 2340 

GTCAGCCTTT ATGGCACCAC CCGAGTGAGA GACGGGCTGG ACCTCACGGA CATTCTTCCC 2400 

CCGGAAACCA GAGAATATGA CTTCATAGCT CCCCAGTTCA ACTACTTCTC TTTCTACAAC 2460 

ATGTATATAC TCACCCAGAA AGCAGACTAC CCGAATATCC AGCACCTACT TTACGACCTT 2520 

CATAAGAGTT TCAGCAATGT GAAGTATGTC ATGCTGCAGG AGAACAAGCA ACTTCCCCAA 2580 

ATGTGGCTCC ACTACTTTAG AGACTGGCTT CAAGGACTTC AGGATGCATT TGACAGTGAC 2640 

TGGGAAACTG GGAGGATCAT GCCAAACAAT TATAAAAATG GATCAGATGA CGGGGTCCTC 2700 

GCTTACAAAC TCCTGGTGCA GACTGGCAGC CCAGACAAGC CCATCGACAT TAGTCAGTTG 2760 

ACTAAACAGC GTCTCGTAGA CGCAGATGGC ATCATTAATC CGAGCGCTTT CTACATCTAC 2820 

CTGACCGCTT GGGTCAGCAA CGACCCTCTA GCTTACCCTG CCTCCCAGGC CAACATCOGG 2880 

CCTCACCGGC CGGAGTGGGT CCATCACAAA GCCGACTACA TGCCAGAGAC CAGGCTGAOA 2940 

ATCCCAGCAG CAGAGCCGAT CGAGTACGCT CAGTTCCCTT TCTACCTCAA OGGCCTACGA 3000 
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OACACCTCAO 

AGCCTCGGAC 
ACCCTCCCCC 
TGCCCACTCT 
ATGACOGTTO 
GTCCTCATCC 
GCCTTTCTOA 
TTTCCTOCCG 
TCCGAATTTC 
GGGGTTCTCA 
CAGCTGTCTC 
AOTGTCGTCC 
TCGCACTACA 
GCACACCACG 
CTCTTTGCCC 
CGCCAACAGC 
OGAAGGGATC 
TTTGAAATTT 
GGGGCCCGTT 
AGCTACTCCC 
CCOCOGCCTG 
CCTGAGACTG 
AGGAGCGACT 
TGGGGGAGCA 
AAGCCCCCCC 
CCCACTTCAT 
AARAGGTGTA 
CCACTCCTGC 
TCTGCCACAA 
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ACTTTOTGOA 
TGTCCAOCTA 
ACTGGCTCCT 
TCCTCCTCAA 
AGCTCTTTGO 
TGATTCCATC 
CAGCCATTGO 
TTCTCGAOGO 
ATTTCATTGT 
ATGGACTGGT 
CAGOCAATGC 
GGTTTGCCGT 
GCTCTCAGAC 
GTGCOGGAGG 
GGTCCACTGT 
CCCACCTGGA 
CCCCTAGAGA 
CTACTGAAGG 
CTCACAACCC 
ACCCCATCAC 
GACCTGGGOG 
ATCAOGGGGT 
CAAAGCTGGA 
CCTCCAACTG 
CCCACCTCTT 
TGTTACTGTA 
CACATGTAAT 
CCCAGAGTGG 
CCAAGCTTAA 



ACCCATAGAA AAAGTCAGAO TCATCTOTAA CAACTATAOC 
CCCCAATCGC TACCCCTTCC TOTTCTCGCA CCAATACATC 
GCTATCCATC ACOCTGGTGC TCCCCTCCAC CTTTCTACTC 
CCCCTGGACC CCOGCCATCA TTCTCATGGT CCTCCCTCTC 
CAWATCCCC CTCATTCGCA TCAACCTOAO TOCTCTCCCT 
WTWCCMC CCAOTCGACT TCACCGTCCA CCTCCCTTIC 
OGACAAOAAC CACACCGCTA TGCTCGCTCT GGAACACATG 
TGCTCTGTCC ACTCTCCTGC GTCTACTQAT CCTTCCACCC 
CACATACTTC TTTGCCCTCC TCCCCATTCT CACCCTCTTC 
TCTCCTCCCT GTCCTCTTAT CCTTCTTTCC ACCGTGTCCT 
CCTAAACCOA CTOCCCACTC CTTCCCCTGA CCCGCCTCCA 
CCCTCCTOGT CACACCAACA ATGCGTCTCA TTCCTCCOAC 
CACGCWSTCT CCCATCAGTG AOGACCTCAG GCAATACGAA 
CCCTGCCCAC CAAGTGATTG TGGAAGCCAC AGAAAACCCT 
CCTCCATCCG 6ACTCCAGAC ATCACCCTCC CTTGACCCCT 
CTCTGGCTCC TTGTCCCCTG CAGCCCAAGG CCAGCAGCCT 
"WCTTOCGC CCACCCCCCT ACAGACCGCO CAGAGACGCT 
OCATTCTCGC CCTAGCAATA GGGACCGCTC AGGGCCCCGT 
TOGGAACCCA ACOTCCACCG CCATCGGCAG CTCTGWCCC 
CACTGTGACG GCTTCTGCTT OGGTGACTCT TGCTCTGCAT 
OUCCCCCGA GGGGG6CCCT GTCCAGGCTA TGAGAGCTAC 
ATTTGACGAT CCTCATGTCC CTTTTCATCT CAGGTOTGAC 
GGTCATAGAG CTACAOGACO TGCAATGTGA GGAOAGGCCO 
AGCGTAATTA AAATCTGAAG CAAAGAGGCC AAAGATTOGA 
TCCAOAACTG CTTCAAGAOA ACTGCITGGA ATTATGCGAA 
ACTOATTGTA TTATTKKOTG AAATATTTCT ATAAATATTT 
ATACATGGAA ATOCTCTACA GTCTATTTCC TCGGGCCTCT 
GGAGACCACA OGGGCCCTTT CCCCTOTGTA CATTCGTCTC 
CTTACTTTTA AAAAAAATCT CCCAGCATAT GTOCCTGCTG 



3060 
3120 
3180 
3240 
3300 
3360 
3420 
3480 
3540 
3600 
3660 
3720 
3780 
3840 
3900 
3960 
4020 
4080 
4140 
4200 
4260 
4320 
4380 
4440 
4500 
4560 
4620 
4680 
4740 
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CTTAAATATT GTATAATTTA CTTCTATAAT TCTATCCAAA TATTGCTTAT GTAATAGCAT 4800 

TATTTCTAAA GGTTTCTGTT TAAAATATTT TAAATTTGCA TATCACAACC CTGTGGTAGC 4860 

ATGAATTGTT ACTGTTAACT TTTGAACACG CTATGCGTGG TAATTGTTTA ACOAGCAGAC 4920 

ATGAAGAAAA CAGGTTAATC CCAOTGGCTT CTCTAGCGGT AGTTGTATAT GGTTOGCATG 4980 

GGTGGATGTG TOTGTGCATG TGACTTTCCA ATGTACTGTA TTGTCGTTTC TTGTTGTTGT 5040 

TGCTGTTGTT GTTCATTTTG CTGTTTTTGG TTCCTTTGTA TGATCTTAGC TCTGGCCTAG 5100 

GTGGGCTCGG AAGGTCCAGG TCTTTTTCTG TCGTGATGCT GGTGGAAACG TGACCCCAAT 5160 

5187 

CATCTGTCCT ATTCTCTGGG ACTATTC 

(2) INFORMATION FOR SEQ ID NOi4i 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1311 amino acida 

(B) TYPE: amino acid 

(C) STRANDEDNESSt aingle 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Met Val Ala Pro Asp Ser Olu Ala Pro Ser Asn Pro Arg He Thr Ala 
15 10 15 

Ala Hie Glu Ser Pro Cys Ala Thr Glu Ala Arg Hia Ser Ala Asp Leu 
20 25 30 

Tyr He Arg Thr Ser Trp Val Aap Ala Ala Leu Ala Leu Ser Glu Leu 
35 40 45 

Glu Lya Gly Aen He Glu Gly Gly Arg Thr Ser Leu Trp He Arg Ala 
SO 55 60 

Trp Leu Gin Glu Gin Leu Phe He Leu Gly Cya Phe Leu Gin Gly Aap 
65 70 « 80 

Ala Gly Lya Val Leu Phe Val Ala He Leu Val Leu Ser Thr Phe Cyfl 
85 90 95 

Val Gly Leu Lye Ser Ala Gin He Hie Thr Arg val Aap Gin Leu Trp 
100 105 no 

Val Gin Glu Gly Gly Arg Leu Glu Ala Glu Leu Lya Tyr Thr Ala Gin 
115 120 125 

Ala Leu Gly Glu Ala Aap Ser Ser Thr His Gin Leu Val He Gin Thr 
130 135 140 
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ill ^ ^ ™ — *~ Leu Hie Pro ciy Ala ta • 

150 ***Y *** Leu Leu oiu 

His Leu Lye Val Val His Ala Al« ,k * ^ 
165 Ala Ala Thr Arc Val Thr Val Hi. Met Tyr 

175 

A " P 118 Glu Trp Ar» Le„ Lya U|u _ ^ ^ 

180 y <*• »yr S« Pro Ser II. Pro 

. 190 

*»p Phe Clu ciy Tvr Hi. m- T , 

M5 Tyr Hi. 81 . clu ne ^ ^ 

205 

a - «, - „ s „. ^ w s ai . _ ty> m> iy> 

Leu Gl„ Trp Thr Hie Leu A8 _ Pro . „, 2 *° 

Aen Pro Leu Clu val val clu Clu Val Lys 

255 

Lya Leu Lya Ph . cln Phft 

260 f *J Th * "e Clu Ala Tyr Met Lya 

. 270 

- - «j u. „ s .. M . „. t iy . ty< ^ cy> ^ ^ ^ 

285 

Thr Asp Pro Hio Cva p M B , ^ 

eye Pro Al. Thr Ala pr<J A ,„ ^ ^ ^ 

300 

- - »-P v. t „. „. 0l0 ^ set au ^ ^ ^ ^ ^ ^ 

«. - * «.< «. „„ 01 „ 0l „ £ J VIl Cly wy AU - 

*° - s - «• - , n «. ^ _ „„ " v>i 

350 

a oly elu a «- - «. * «. ai . 

- a ~ „* «. „. „ oin mu iy> ~ M> ^ 

380 

a - - ... s Ly . , ta „. M . s m ^ ty< n> ^ 

Thr ser Cly ser Val Ser Ser Al. ^ ^ 
405 Ma S.r Ph . tyr Pro Ph. Ser Thr 

415 

Ser Thr t*u Aon Asp n* . 

42o P He Leu ciy Lya Phe Ser Cl« Val ^ ^ ^ 

& 430 

445 

- «• - ^ ^ , M n . „ s „ ol> u> oiy v>i My ^ 
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450 455 460 

Ala Gly Val Leu L u Lou Ser He Thr Val Ala Ala Gly Leu Gly Phe 



465 



470 



475 



480 



Cya Ala Leu Leu Gly He Pro Phe Aen Ala Ser Ser Thr Gin He Val 
485 490 495 

Pro Phe Leu Ala Leu Gly Leu Gly Val Gin Aap Met Phe Leu Leu Thr 
500 505 510 

Hie Thr Tyr Val Glu Gin Ala Gly Aap Val Pro Arg Glu Glu Arg Thr 
515 520 525 

Glv Leu Val Leu Lys Lye Ser Gly Leu Ser Val Leu Leu Ala Ser Leu 
530 535 540 

Cya Aen Val Met Ala Phe Leu Ala Ala Ala Leu Leu Pro He Pro Ala 
5 i5 550 555 560 

Phe Arg Val Phe Cys Leu Gin Ala Ala He Leu Leu Leu Phe Aen Leu 
565 570 575 

Gly Ser He Leu Leu Val Phe Pro Ala Met He Ser Leu Asp Leu Arg 
580 585 590 

Arg Arg Ser Ala Ala Arg Ala Aap Leu Leu Cye Cys Leu Met Pro Glu 
595 600 605 

ser Pro Leu Pro Lye Lys Lye He Pro Glu Arg Ala Lys Thr Arg Lys 
610 615 620 

Aen Aap Lys Thr Hie Arg He Asp Thr Thr Arg Gin Pro Leu Asp Pro 

635 6 40 



625 



630 



Asp Val Ser Glu Aen Val Thr Lye Thr Cys Cye Leu Ser Val Ser Leu 
645 650 655 

Thr Lys Trp Ala Lys Asn Gin Tyr Ala Pro Phe He Met Arg Pro Ala 
660 665 670 

Val Lys Val Thr Ser Met Leu Ala Leu He Ala Val He Leu Thr Ser 
675 680 685 

Val Trp Gly Ala Thr Lye Val Lys Asp Gly Leu Asp Leu Thr Aap He 
690 695 700 

Val Pro Glu Aen Thr Asp Glu His Glu Phe Leu Ser Arg Gin Glu Lys 
705 710 715 720 

Tyr Phe Gly Phe Tyr Asn Met Tyr Ala Val Thr Gin Gly Asn Phe Glu 
725 730 735 

Tyr Pro Thr Asn Gin Lye Leu Leu Tyr Glu Tyr His Asp Gin Phe Val 
740 745 750 

Arg II Pro Asn He He Lye Aen Asp Aan Gly Gly Leu Thr Lye Phe 
755 760 765 
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T,p L.u S.r L.u Ph. Arg ^ Trp Leu ^ ^ ^ ^ ^ ^ ^ 

775 780 

Asp Lye Clu Val Ala Ser Oly Cya n. n» n. «i «. - 
785 790 r * " e Thr oln °lu Tyr Trp Cya Lye 

795 800 

Ann Ala Sax Aap Glu Cly lie t.»i, »i. «. 

805 * Tyr LyB L-u v *l Gin »hr 

810 815 

Oly Hi. Val Aap Aan Pro He Aap Ly. S . r ^ u Ile Thr M . ^ 

825 830 

Arg Leu val Aap Ly. a. P Oly XI. lie Aan Pro Ly. Ala Ph. Tyr Aa„ 

840 84s 

Tyr Leu ser Ala Trp Ala ,hr Aa„ Aap Ala Leu Ala Tyr Cly Ala s.r 

855 860 
Jin Cly A.„ Lau Lye p ro cl „ pre Gln ^ ^ ^ ^ 

875 880 
Aap val Hla X— Clu XI. Lya ly8 Sat Set pr<J ^ ^ ^ ^ ^ 

Leu Pro Pho Tyr L*u Ser cly 
900 

X- II. Arg ser Val Arg Aap Leu cya Leu Lya Tyr Clu Ala Lya Cly 

920 925 

Lau pro Aan Ph. Pro s.r Cly XI. p ro Phe ^ phe Trp elu ^ 

935 940 
£« Tyr Lau Arg Thr Ser Leu L.u Leu Ala L.u Ala Cya Ala Leu Ala 

955 960 
Al. V.1 Ph. XI. Ala Val Mat Val L.u Leu Leu Aan Ala Trp Ala Ala 

970 975 

V.1 Leu val Thr L.u Al. L. u Ala Thr L.u Val Leu Cln Leu Leu Cly 

985 99Q 

Val Met Ala Leu Leu Cly Val Lya Lau s.r Al. Met Pro Ala Val L.u 

1000 1005 
L.u Va^Leu Al. XI. cly Are Cly Val Hia Ph. Thr Val Hi. Leu Cya 

1015 1020 

i-Ur »» v.! t,« s„ „. 01y „. ly . ^ „, M . ^ 

1035 1040 
Ala Leu ciu s.r V.^Leu Ala Pro Val Val Hia Cly Ala L.u Ala Ala 



885 890 895 

900 ' ^ A8P Thr *" S8r Ile «*■ ™* 

905 910 



1050 



1055 



Al. Leu Al. Ala o S.r Met Leu Al. Al. Ser Clu Cya Cly Ph. Val Al. 

1065 1070 
Axg Leu Ph. U« Arg Leu Leu L.u Aap XI. Val Ph. Leu « y Leu XI 
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1075 1080 1085 

Asp Gly Leu Leu Phe Phe Pro II Val Leu Ser He Leu Gly Pro Ala 
1090 1095 1100 

- Ala Glu Val Arg Pro He Glu His Pro Glu Arg Leu Ser Thr Pro Ser 
1105 1110 1115 1120 

Pro Lye Cys Ser Pro He His Pro Arg Lys Ser Ser Ser Ser Ser Gly 
1125 1130 1135 

Gly Gly Asp Lys Ser ser Arg Thr Ser Lys Ser Ala Pro Arg Pro Cys 
1140 1145 1150 

Ala Pro Ser Leu Thr Thr He Thr Glu Glu Pro Ser Ser Trp His Ser 
1155 1160 1165 

Ser Ala His Ser Val Gin Ser Ser Ket Gin Ser He Val Val Gin Fro 
1170 1175 HfiO 

Glu Val Val Val Glu Thr Thr Thr Tyr Asn Gly Ser Asp Ser Ala Ser 
1185 1190 1195 1200 

Gly Arg Ser Thr Pro Thr Lys Ser Ser His Gly Gly Ala He Thr Thr 
1205 1210 1215 

Thr Lys Val Thr Ala Thr Ala Asn He Lys Val Glu Val Val Thr Pro 
1220 1225 1230 

Ser Asp Arg Lys Ser Arg Arg ser Tyr His Tyr Tyr Asp Arg Arg Arg 
1235 1240 1245 

Asp Arg Asp Glu Asp Arg Asp Arg Asp Arg Glu Arg Asp Arg Asp Arg 
1250 1255 1260 

Asp Arg Asp Arg Asp Arg Asp Arg Asp Arg Asp Arg Asp Arg Asp Arg 
1265 1270 1275 1280 

Glu Arg Ser Arg Glu Arg Asp Arg Arg Asp Arg Tyr Arg Asp Glu Arg 
1285 1290 1295 

Asp His Arg Ala Ser Pro Arg Glu Lys Arg Gin Arg Phe Trp Thr 
1300 1305 1310 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS s 

(A) LENGTH: 4434 base pairs 

(B) TYPE* nucleic acid 

(C) STRANDSDNESSs single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPES CDNA 



(Xi) SEQUENCE DESCRIPTIONS SEQ ID NO: 5s 
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OW "° M °" a °° Mm «»«~ TTCTOTGTTG 

— — . -caaaaca Aoacooecio M „ 

~»— . ococcctc ccctaa™ =™rrcccc tocc^cct cccccfcc. 

CGAGATACAG ATACATCTCT CATCCACCCC raar^ 

^ CACACCCTCC CACCCCTTCC CGACACACAC 

CGCGATCTGG ATTATTCTOS CAT^ACA X.CCCCC^ 

«— caca^ MIMAItte mcmmto oikcaoccc amemK 

TATCTOCQAT CAOTATTCCA .«c„C«C OAAACCCTC CC^CTCCOT CCAAAACCAC 

MCtmOCt GG CTATCCTO CTCC^CA COTcMool 
««CCC„. TCCACTCCAA SOXCCACCAO CCCC.^ « M 
CCOCAACXOO CCXACACCA CAAGACCATC CCCACAC ACXCCGCCAe 0CAICMcTO 

Cla ™ m <XhCCaaa ««— — CCXCO AXCCCCACCC CCCCCCC 
««=0. M ,00^ =CC=AC=CC ^AACC^C ACCCA^ CACCCAATCO 

— — caxccc^c —exec 

^ ~™ -CCC^. 

« wra0ITMe 0TOIMII> T>ceKcccr cmccmqja 

CTOCTOIM> ATATCAAACA AAACATOTCC 

GACGAAAAGA TCACC^A CTTCCAGACC Ctt^ ACATCAASCC 

°° mOIOOCT ,C "° OK " — = «cccaa™ cccacacc 

GCBCCOMOA MSACA ^ — — — — CCAXCCW C^AOOCTCC 
«=CT» M ~CC=AAOCA CATOCAC^C C^ASC rCATTGTGCa 

0CM *" m "™ ™C= »=xcc= TOT 0a „ 

ACCGAGAAGO AAATCTACCA CCAOTGCCAO CAC**™,.. 

wTGOCAG CACAACTACA ACGTCCACCA TCTTCCATCC 

AOCCAOOACA ACCCAOCGCA GOTTTWAAC GCCTCCCACC CCAAt™. 

CAACAGCTGC TACGTAAACA GTOGAGAATT ccnxnn*~. 

«*AGAATT GCCACCAACT ACOATATCTA CGTCTTCAOC 

TOGGCTCCAC TGGATCACAT CCTCCCCAAQ TWTrrr,^ 

"* iCCAAO TTCTCCCATC CCAGOCCCTT CTCCATTOIC 

ATCOCCCTCO COGTCAOCGT TTTCTATCee 

ATCCC TTTTGCACGC TCCTCCCCTC GACGGACCCC 
CTCC^GGCC AGAGCAGTGT CCGCG^ 

GCOGGATTCG GA^TCACC CCTCCTCCC, ATCGTTTTCA AXCCCC^c CGCTCCCTAT 
OCGCAGAGCA ATCGCCGCCA 6CAOA0CAAG COA^ AGAACCCCAG CACCCAGGTG 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
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GTTCCCTTTT 
CTGTTCAGTG 
GCTTTCAACG 
CTATTGGTTT 
GACATCTTCT 
CTGCCGCTGA 
AGGGTGCCGC 
AGTCACTCAC 
CTCATGCGCA 
AGCTTGTATG 
CACAGCAACG 
TATGCGGTTA 
CATGATTCCT 
TTCTGGCTGC 
TACCGCGAOG 
CTGGCCTACA 
GTGCTCACCA 
TATCTGTCGG 
TATCCGGAAC 
AGTCTGCCAT 
CAGATCAAGA 
CTGCCCAACT 
TCCTCACTGG 
CTCCTGCTCT 
CAGATCTTTG 
CTCATCCTCA 
ACATCCGTTG 
CTTGTCCACG 
GAGTTTGTGA 



TGGCCCTTGG 
CCTGCAGCAC 
TATTCTGTCT 
TTCCGGCCAT 
GCTGCTGTTT 
ACAACAACAA 
TGCCCOCCCA 
TGGCGTCCTT 
GCTGGGTGAA 
CCTCCAOGCG 
AGCACAAGTT 
CCCAGGCCAA 
TTGTGOGGGT 
TGCTCTTCAG 
GAOGGCTGAC 
AGCTAATCGT 
ATOGCCTGGT 
CATGCGCCAC 
CGCGCCAGTA 
TGGTCTACGC 
CCCTGATAGG 
ATCCATCGGG 
CCATGATCCT 
CCGTTTOGGC 
GGGCCATGAC 
GOGTGGGCAT 
GCAACCGACA 
GCATGCTGAC 
TCCGGCACTT 



TCTGGGOCTC 
CGCAGGATCC 
GCAGGCTGCC 
GATTTOGTTG 
TCCGGTGTGG 
CGGGCOOGGG 
GAATCCTCTG 
CTCCCTGOCA 
GTTCCTGACC 
CCTTCAGGAT 
CCTGGATCCT 
CTTTGAATAT 
GCCACATGTG 
CGAGTGGCTG 
CAAGGAGTGC 
GCAAACCGGC 
CAACAGCGAT 
CAACGACGTC 
TTTTCACCAA 
TCAGATGCCC 
TCATATTCGC 
CATTCCCTTC 
GGCCTGCGTG 
CGCCGTTCTC 
TCTGCTGGGC 
GATGCTGTGC 
GOGCOGOGTC 
CTCCGGAGTG 
CTGCTGGCTT 



GATCACATCT 
TTCTTTGCGG 
ATCGTAATGT 
GATCTAOGGA 
AAGGAACAGC 
GCCOGGCATC 
CTGGAACAGA 
ACCTTCGCCT 
GTTATGGGTT 
GGCCTGGACA 
CAAACTOGGC 
CCCACCCAGC 
ATCAAGAATG 
CGTAATCTGC 
TGGTTCCCAA 
CATGTGGACA 
GGCATCATCA 
TTCGCCTACG 
CCCAACGAGT 
TTTTACCTCC 
GACCTGAGOG 
ATCTTCTGGG 
CTACTCGCCG 
GTGATCCTCA 
ATCAAACTCT 
TTCAATGTGC 
CAGCTGAGCA 
GCCGTGTTCA 
CTGCTGGTGG 



TCATAGTGGO 
COGCCTTTAT 
GCTCCAATTT 
GACGTACCGC 
CGAAGGTGGC 
CGAAGAGCTG 
GGGCAGACAT 
TTCAGCACTA 
TCCTCGOGGC 
TTATTGATCT 
TCTTTGGCTT 
AGCAGTTGCT 
ATAACGGTGG 
AAAAGATATT 
ACGCCAGCAG 
ACCCCGTGGA 
ACCAACGCGC 
GAGCTTCTCA 
ACGATCTTAA 
ACGGACTAAC 
TCAAGTACGA 
AGCAGTACAT 
CCCTGGTGCT 
GCGTTCTGCC 
CGGCCATTCC 
TGATATCACT 
TGCAGATGTC 
TGCTCTCCAC 
TCTTATGOGT 



ACCGAGCATC 
TCCCGTGCCG 
GGCAGCGGCT 
CGGCAGGGCG 
ACCTCCGGTG 
CAACAACAAC 
CCCTGGGAGC 
CACTCCCTTC 
CCTCATATCC 
GGTGCCCAAG 
CTACAGCATG 
CAGGGACTAC 
ACTGCCGGAC 
CGACGAGGAA 
CGATGCCATC 
CAAGGAACTG 
CTTCTACAAC 
CGGCAAATTG 
GATACCCAAG 
AGATACCTCG 
GGGCTTOGGC 
GACCCTGCGC 
GGTCTCCCTG 
CTCGCTGGCC 
GGCAGTCATA 
GGGCTTCATG 
CCTGGGACCA 
GTCGCCCTTT 
TGGCGCCTGC 



1800 

1860 

1920 

1980 

2040 

2100 

2160 

2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 

2880 

2940 

3000 

3060 

3120 

3180 

3240 

3300 

3360 

3420 

3480 
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AACACCCTTT TOOTGTTCCC CATCCTACTG AGCATGG** CACCCGACCC CCAGCTCGTC 
CCGCTGGAGC ATCCAGACCG CATATCCACG CCCTCTCCCC TCCCCOTCCG CACCACCAAO 
AGATCGGGCA AATCCTATGT GGTCCAGGCA TOCCCATCCT CCCGAOCCAC CTGCCAGAAG 
TCGCATCACC ACCACCACAA AOACCTTAAT GATCCATCGC TCACCACCAT CAC0GAGCA6 
CCGCAOTCGT CGAAGTCCAG CAACTCCTCC ATCCAGATGC CCAATCATTG CACCTACCAG 
CCGCGCGAAC AGCGACCCCC CTCCTACOCC CCCCCCCCCC CCCCCTATCA CAAOCCCGCC 
OOCCAGCAOC ACCACCACCA TCACCCCCCC CCCACAACCC CCCCOCCTCC CTTCCCGACG 
GCCTATCCGC CCCACC^CA GAGCATCCTC CTCCACCCOO AOCTCACCCT COACACCACC 
CACTCCCACA GCAACACCAC CAACCTCACC CCCACCOCCA ACATCAACGT CCACCTCCCC 
ATCCCCGCCA CCCCGGTGCG CACCTATAAC TTTAOCAGTT AOCACTAOCA CTAGTTCCTG 
TAGCTATTAG CAOGTATCTT TAGACTCTAG CCTAAGCCGT AACCCTATTT GTATCTGTAA 
AATOGATTTC TCCACOGGGT CTGCTGAGGA TTTOGTTCTC ATGGATTCTC ATGGATTCTC 
ATCGATGCTT AAATGGCATG OIAATTGCCA AAATATCAAT TTTTGTOTCT CAAAAAGATC 
CATTAGCTTA TCGTTTCAAG ATACATTTTT AAAGACTCCG CCAGATATTT ATATAAAAAA 
AATCCAAAAT CGACCTATCC ATGAAAATTG AAAACCTAAG CAGACCOGTA TGTATGTATA 
TGTGTATGCA TGTTAGTTAA TTTCCOGAAG TCCGGTATTT ATAGCAGCTC CCTT 
(2) INFORMATION FOR SBQ l 0 NO: 6, 

U) SEQUENCE CHARACTERISTICS! 

(A) LENGTHj 1285 amino acid. 

(B) TYPSt amino acid 

(C) STRANDEDNESS i single 
(0) TOPOLOGY- linaar 

(ii) MOLECULE TYPE: protain 



3540 

3600 

3660 

3720 

3780 

3840 

3900 

3960 

4020 

4060 

4140 

4200 

4260 

4320 

4380 

4434 



(Xi) SEQUENCE DESCRIPTION. SBQ ID HO, 6: 

«et Asp Ar g Asp Ser Leu Pro Arg v.l Pro Asp Thr Hi. oi y ^ Val 

10 15 

v.a ». P «. j. u. .„ „ ^ ^ u> ^ ^ 
"* S" V " - — «■ -P X. «y L». «. ^ 01y 

40 45 

Sar Arg Thr Ala He Tyr Leu Are s ,. v.i »u „, 

50 V Arg s r Val Phe Gin ser His Leu Clu 
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Thr Leu Gly Ser Ser Val Oln Lye Hie Ala Gly Lye Val Leu Phe Val 
65 70 75 80 

Ala lie Leu Val Leu Ser Thr Phe Cye Val Gly Leu Lye ser Ala In 
85 90 95 

He Hie Ser Lye Val Hie Gin Leu Trp He Gin Glu Gly Gly Arg Leu 
100 105 110 

Glu Ala Glu Leu Ala Tyr Thr Gin Lye Thr He Gly Glu Aap Glu Ser 
115 120 125 

Ala Thr Hie Gin Leu Leu He Gin Thr Thr Hie Aep Pro Aen Ala Ser 
130 135 140 

Val Leu Hie Pro Gin Ala Leu Leu Ala Hie Lou Glu Val Leu Val Lye 
145 150 155 160 

Ala Thr Ala Val Lye Val Hie Leu Tyr Aep Thr Glu Trp Gly Leu Arg 
165 170 175 

Aap Met Cye Aen Met Pro Ser Thr Pro Ser Phe Glu Gly He Tyr Tyr 
180 185 190 

He Glu Oln He Leu Arg Hie Leu He Pro Cye Ser He He Thr Pro 
195 200 205 

Leu Aep Cye Phe Trp Glu Gly Ser Gin Leu Leu Gly Pro Glu Ser Ala 
210 215 220 

Val Val He Pro Gly Leu Aen Gin Arg Leu Leu Trp Thr Thr Leu Asn 
225 230 235 240 

Pro Ala Ser Val Met Gin Tyr Met Lys Gin Lye Met Ser Glu Glu Lye 
245 250 255 

He ser Phe Aep Phe Glu Thr Val Glu Gin Tyr Met Lye Arg Ala Ala 
260 265 270 

He Gly Ser Gly Tyr Met Glu Lye Pro Cye Leu Aen Pro Leu Aen Pro 
275 280 285 

Asn Cye Pro Aep Thr Ala Pro Aen Lye Aen Ser Thr Gin Pro Pro Aep 
290 295 300 

Val Gly Ala He Leu Ser Gly Gly Cye Tyr Gly Tyr Ala Ala Lye Hie 
305 310 315 320 

Met Hie Trp Pro Glu Glu Leu He Val Gly Gly Arg Lye Arg Asn Arg 
325 330 335 

Ser Gly Hie Leu Arg Lye Ala Gin Ala Leu Gin Ser Val Val Gin Leu 
340 345 350 

Met Thr Glu Lye Glu Met Tyr Asp Gin Trp Gin ABp Aon Tyr Lys Val 
355 360 365 

His Hie Leu Gly Trp Thr Gin Glu Lye Ala Ala Glu Val Leu Asn Ala 
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360 

Trp Gin Arg Asn fh Ser Arg Glu Val ci„ „» „ 

385 390 * V -l Clu Gin Leu Leu teg Lye cln 

Ser Arg He Al. Thr Asn Tyr Aap i ie Tyr Val ph. , 

40S * Z** val Ph « Ser Ser Ala Ale 



410 

" 415 
I*u Asp Aep He Leu Ale Lye P fa. Ser Hi- . - 

420 y Jjf Hi » p "» S« Ala Leu S er 11. 

430 

Val lie ciy Val Ala Val Thr Val ie u ^ Ala Bh ^ 

435 44Q ^ U ^ *!• Pha Cya Thr Leu Leu 

445 

**• a " 9 - p ~ ~ a «* »» - ~ m «, « „. w> 

~ u. t . u , ; . 5 „ tht A1 . „. oly ^ s „ u . 

r T «80 
LBU lBU C1 * »• V.1 Pha Aan Ala Leu Thr Al. Al. „ 

485 ™5 Ala A1 * ^r Ala Glu Ser 

495 

Asn Arg Arg Glu Gin Thr Lya Leu lie Leu r » 

500 Y **" Leu L y« Aan Ala Ser Thr Gin 

* 510 



~ v., jj. Ph . _ „. L . u ^ wy ^ ^ ^ ^ ^ 

525 

v * 1 s p - - - a - «• *. .„ Ihr AU Cly _ rh< 

540 

- «U „. , ta .J. p „ p „ M> ^ vu ^ ^ ^ 

555 560 

Cln Ala Al. ii. val Het Cy. s.r A8n ^ 

565 ZZZ * la A1 * **« I*u Leu Val 

- - «. £ a. s „ ^ „ p ^ u ^ ^ ^ ^ ^ ^ 

590 

- a >- a - v., ly . 01 „ „. ,„ 

605 

Val Ala Pro Pro Val Le u Pro & 

61o Leu Pro Leu Asn Asn Asn Asn ciy Arg ciy Ala 

620 

**g His Pro Lya Ser Cvm *. n * 

625 * ABn Afln Asn Ara Val x>^ r~ ~ 

b " 630 9 I" Pro ^ u Pro Ala Cln 

640 



Aen Pro Leu Leu Clu Cln a™ m » 

645 ^ Ala A8 P »• p « «y s.r Sar His Ser 

- a - - «. ». jj. th . M „ M . 

5 670 

Pne Leu Net Aro Ser Tm 0.1 , 

Trp V.1 g, Phe ^ Thr VM ^ ^ 

" 665 
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Ma Ala u« XI. Ser Ser Leu Tyr Ala Ser Thr Arg Leu Gin Asp cly 
690 



Leu A.p He II Asp Leu Val Pro Lys Asp Ser A.n Glu Hi. Lye Phe 
705 710 715 

Leu Asp Ala Gin Thr Arg Leu Phe Gly Phe Tyr Ser Met Tyr Ala Val 

725 730 

Thr Gin Gly Asn Ph. Glu Tyr Pro Thr oln Gin Gin Leu Leu Arg Asp 

740 745 
Tyr Hie Aep Ser Phe Arg Val Pro Hie Val He Lye Asn Asp Asn Gly 



755 760 



Gly Leu Pro Asp Ph. Trp Leu L.u Leu Phe Ser Glu Trp L.u Gly Aen 

770 775 
Leu Gin Lye He Phe Asp Glu Glu Tyr Arg Asp Gly Arg Leu Thr Lys 



790 



Glu cys Trp Phe Pro Asn Ala Ser ser Asp Ala lie Leu Ala Tyr Lys 

805 810 

u. val Gin Thr Gly Hi. Val Asp Asn Pro Val Asp Lys Glu Leu 

820 825 

Val Leu Thr Asn Arg L.« Val Asn Ser Asp Gly He lie Asn Gin Arg 

835 

Ma Ph. Tyr Asn Tyr Leu S.r Ala Trp Ala Thr Asn A.p Val Ph. Ala 



850 855 «0 

Tyr Gly Ala S.r Gin Cly Lys Leu Tyr Pro Glu Pro Arg Gin Tyr Phe 
B6S 870 

His Gin Pro Asn Glu Tyr Asp L.u Lys H. Pro Lys S.r L.u Pro Leu 
885 890 

Val Tyr Ala Gin Met Pro Ph. Tyr L.u His Gly Leu Thr Asp Thr S.r 

905 



900 



Gin lie Lys Thr L.u II. Gly His lie Arg Asp Leu Ser val Lys Tyr 

920 



915 



Glu Gly Ph. Gly Leu Pro Asn Tyr Pro Ser Gly II. Pro Ph. He Phe 



935 



945 



930 

Trp Glu Gin Tyr H.t Thr Leu Arg S.r Ser Leu Ala Met lie Leu Ala 



cys val Leu Leu Ala Ala Leu Val Leu Val S.r Leu Leu Leu Leu S.r 

val Trp Ala Al. Val Leu Val II. Leu Ser Val Leu Ala Ser Leu Ala 
980 985 
in II Phe Oly Ala Mat Thr Leu Leu Gly XI. Lys Leu S.r Ala He 
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" 5 "°° 1005 



Pro Ala Val II. Le u n e ^ Ser y x 

1010 101S aA CAy ^« Cye Phe Asn 

10 " 1020 

JU «- U. *,r L» ol y „. «. t thr s „ „ y A>n ^ ^ 

1035 104C 

V " ° ln - - - '« X 01 ' - - ~ - «r 

1050 1055 
Hot Ihr s.^, v., „. » al _ s „ fc ^ ^ ph> 

1065 1070 
SJ."* ~ ~ *" - ~ ~ «- cy. 

1080 1085 

Val Oly Ala Cya Aen Ser Leu Leu Val Phe Pro il- t r 

1090 , OP e Pro Ila ^ ^« Ser Met 

1095 1100 

V.1 «y Pro c.„ „. v<1 p „ ^ 81< ^ ^ n> 

1115 1120 
~ ». ,„ s„ ,„ ^ P „ v>1 ^ ly . ^ ^ 

1130 1135 

* ~ S."" - f~ ,~ «y s« =y. «» ly . 

1145 1150 

S " Bl ' usV" " U l " — - - — <te 

1160 1165 

He Thr Glu Clu Pro Gin Ser Trp Lys Ser Ser c * 

1170 117 c ^ Y r Ser Asn Ser He Gin 

1175 1180 

naV" - A " P *• P ~ **« «- «» ^ '» Ala ,„ 

1195 1200 

tyr «. M . Pto „. „. M . ^ u> Hn ^ ^ 

i21 ° 1215 
»t. «„ HI. 01. „ y ^ ,„ rar ,„ p „ f „ ph . ^ ^ 

1225 1230 
Ala Tyr Pro Pro oiu L«u ©In ser lis v.i ».i „ „ 

1235 STo In Pr ° Glu Val Thr 

1240 1245 

Si* - ft- ~ V.1 Thr «. th , 

1255 1260 

Ala Aan lis Lya Val clu L«u »i. »,„«. „ 

1265 1270 Cly Ala Val Ser 



Tyr Aan Phe Thr Ser 
1285 

(2) INFORMATION FOR SEQ ID KQ.-7. 



1275 1280 
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(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 345 bas pairs 

(B) TYPEs nucleic acid 
<C) STRANOEDNESS t single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION i SEQ ID NO:7: 

AAGGTCCATC AGCTTTGGAT ACAGGAAGGT GGTTCGCTCG ACCATGAGCT AGCCTACACG 60 

CAGAAATCGC TCGGCGACAT GGACTCCTCC ACGCACCAGC TCCTAATCCA AACNCCCAAA 120 

GATATGGACG CCTCGATACT GCACCCGAAC CCGCTACTGA CGCACCTGGA CGTGGTGAAG 180 

AAAGCGATCT CGGTGACGGT GCACATGTAC CACATCACGT GGAGNCTCAA GGACATGTGC 240 

TACTCGCCCA GCATACCGAG NTTCGATACG CACTTTATCG AGCAGATCTT CGAGAACATC 300 

ATACCGTGCG CGATCATCAC GCCGCTGGAT TCCTTTTCCG AGGGA 345 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 115 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Lys Val His Gin Leu Trp He Gin Glu Gly Gly Ser Leu Glu His Glu 
1 5 10 15 

Leu Ala Tyr Thr Gin Lys Ser Leu Gly Glu Met Asp Ser Ser Thr His 
20 25 30 

Gin Leu Leu He Gin Thr Pro Lys Asp Met Asp Ala Ser He Leu His 
35 40 45 

Pro Asn Ala Leu Leu Thr His Leu Asp Val Val Lys Lys Ala He Ser 
50 55 60 

Val Thr Val His Met Tyr Asp lie Thr Trp Xaa Leu Lys Asp Met Cys 
65 70 75 80 

Tyr Ser Pro Ser He Pro Xaa Phe Asp Thr Hie Phe He Glu Gin He 
65 90 95 
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Phe Glu Aan He lie Pr eve Ala ti« ti- »u « 

100 f« n * Thr >ro leu Afl P <*■ 

105 110 

Trp Glu Cly 
115 

(2) INFORMATION FOR SBQ 10 NO«9l 

(1) SEQUENCE CHARACTERISTICS » 

(A) LENGTH i 5187 but pairs 

(B) TTPEi nucleic eeid 

(C) STRAND BDNESS i ■ ingle 

(O) topology t linear 

(ii) MOLSCDLE TYPE: CDNA 



(xl) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
CGGTCTGTCA CCCGGAGCCG GACTCCCCGC CGGCCAGCAG CGTCCTCGCG ACCCCACCGC 
CCAGGCCCGC CCGCAGCCCG CGGCGGCCGC GGCAACATGG CCTCGGCTGG TAACGCCGCC 
CCGGCCCTGG GCAGGCAGGC CCGCCGCGGG AGGCGCACAC GGACCGGGGG ACCCCACCGC 
CCCGCGCCGG ACCGGGACTA TCTGCACCGG CCCACCTACT GCGACGCCGC CTTCGCTCTG 
GAGCAGATTT CCAAGGGGAA GCCTACTGGC OGGAAACCGC OGCTGTGGCT GAGAGCGAAG 
TTTCAGAGAC TCTTATTTAA ACTGGGTTGT TACATTCAAA AGAACTCCCC CAAGTTTTTG 
OTTGTGCCTC TCCTCATATT TGGGGCCTTC GCTGTGGGAT TAAAGCCAGC TAATCTCGAG 
ACCAACGTGG AGGAGCTGTO GGTGGAACTT CGTCCAOGAG TGACTCGAGA ATTAAATTAT 
ACCCGTCAGA AGATAGGAGA ACAGCCTATC TTTAATCCTC AACTCATCAT ACAGACTCCA 
AAAGAAGAAG GOGCTAATGT TCTGACCACA CAGGCTCTCC TCCAACACCT GGACTCAGCA 
CTCCAGGCCA GTCGTGTGCA CGTCTACATG TATAACACGC AATCCAAGTT GCAACATTTG 
TGCTACAAAT CAGGGGAACT TATCACGGAG ACAGGTTACA TGGATCAGAT AATAGAATAC 
CTTTACCCTT GCTTAATCAT TACACCTTTG OACTCCTTCT GGGAAGGGGC AAACCTACAG 
TCCGGGACAG CATACCTCCT AGGTAAGCCT CCTTTACGCT GGACAAACTT TCACCCCTTG 
GAATTCCTAC AAGAGTTAAA GAAAATAAAC TACCAAGTGG ACAGCTGGGA GGAAATGCTG 
AATAAACCCC AAGTTGGCCA TCCCTACATC GACOGGCCTT GCCTCAACCC AGCCGACCCA 
CATTCCCCTG CCACAGCCCC TAACAAAAAT TCAACCAAAC CTCTTCATCT GCCCCTWTT 
TTGAATGGTO GATGTCAAGC TTTATCCAGG AAGTATATGC ATTGGCACCA OGAGTTGATT 
CTCGCTCGTA COGTCAAGAA TCCCACIGGA AAACTTCTCA CCGCTCACGC CCTGCAAACC 



120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
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ATGTTCCAGT 


TAATGACTCC CAAGCAAATO TATGAACACT 


TCAGGGGCTA 


CGACTATGTC 


1200 


TCTCACATCA ACTGGAATGA AGAGAGGCGA CCCGCCATCC 


TGGAGGCCTG 


GCAGAGGACT 


1260 


TACGTCGAGG 


TGGTTCATCA AAGTGTCGCC CCAAACTCCA 


CTCAAAAGGT 


GCTTCCCTTC 


1320 


ACAACCACGA 


CCCTGGACGA CATCCTAAAA TCCTTCTCTG 


ATCTCACTCT 


CATCCGAGTC 


1380 


GCCAGCGCCT 


ACCTACTGAT GCTTGCCTAT GCCTGTTTAA 


CCATGCTGCG 


CTGGGACTGC 


1440 


TCCAAGTCCC 


ACGGTGCCGT GGGGCTGGCT GGCGTCCTGT 


TGGTTGCGCT 


CTCAGTGGCT 


1500 


GCAGGATTGC 


GCCTCTGCTC CTTGATTGGC ATTTCTTTTA 


ATGCTGOGAC 


AACTCAGGTT 


1560 


TTGCCGTTTC TTGCTCTTGG TGTTGGTGTG GATGATGTCT 


TCCTCCTGGC 


CCATGCATTC 


1620 


AGTGAAACAG 


GACAGAATAA CAGCATTCCA TTTGAGCACA 


GGACTGGGGA 


GTGCCTCAAG 


1680 


CGCACOGGAG 


CCAGCGTGGC CCTCACCTCC ATCAGCAATG 


TCACCGCCTT 


CTTCATGCCC 


1740 


GCATTGATCC 


CTATCCCTGC CCTGCGAGCG TTCTCCCTCC 


AGGCTGCTGT 


GGTGGTGGTA 


1800 


TTCAATTTTG 


CTATGGTTCT GCTCATTTTT CCTGCAATTC 


TCAGCATGGA 


TTTATACAGA 


1860 


OGTGAGGAGA 


GAAGATTGGA TATTTTCTGC TGTTTCACAA 


GCCCCTGTGT 


CAGCAGGGTG 


1920 


ATTCAAGTTG 


AGCCACAGGC CTACACAGAG CCTCACAGTA 


ACACCCGGTA 


CAGCCCCCCA 


1980 


CCCCCATACA 


CCAGCCACAG CTTCGCCCAC GAAACCCATA 


TCACTATGCA 


GTCCACCGTT 


2040 


CAGCTCCGCA 


CAGAGTATGA CCCTCACACG CACGTGTACT 


ACACCACCGC 


CGAGCGAOGC 


2100 


TCTGAGATCT 


CTGTACAGCC TGTTACOGTC ACCCAGGACA 


ACCTCAGCTG 


TCAGAGTCCC 


2160 


CAGAGCACCA GCTCTACCAG GGACCTGCTC TCCCAGTTCT 


CAGACTCCAO 


CCTCCACTGC 


2220 


CTCGAGCCCC 


CCTGCACCAA GTGGACACTC TCTTCGTTTG 


CAGAGAAGCA 


CTATGCTCCT 


2280 


TTCCTCCTGA AACCCAAAGC CAAGGTTGTG GTAATCCTTC 


TTTTCCTGGG 


CTTGCTGGGC 


2340 


GTCAGCCTTT ATCGGACCAC CCGAGTGACA GACGGGCTGG 


ACCTCACGGA 


CATTGTTCCC 


2400 


CGGGAAACCA 


GAGAATATGA CTTCATAGCT CCCCACTTCA 


AGTACTTCTC 


TTTCTACAAC 


2460 


ATGTATATAG 


TCACCCAGAA AGCAGACTAC CCGAATATCC 


AGCACCTACT 


TTACGACCTT 


2520 


CATAAGAGTT 


TCAGCAATGT GAAGTATGTC ATCCTGGAGG 


AGAACAACCA 


ACTTCCCCAA 


2580 


ATGTGGCTGC ACTACTTTAG AGACTGGCTT CAAGGACTTC 


AGGATGCATT 


TGACAGTGAC 


2640 


TGGGAAACTG 


GGAGCATCAT GCCAAACAAT TATAAAAATG 


GATCAGATCA 


CGGGGTCCTC 


2700 


GCTTACAAAC TCCTGGTGCA GACTGGCAGC OGAGACAAGC 


CCATCGACAT 


TAGTCAGTTG 


2760 


ACTAAACAGC GTCTGGTAGA CGCAGATGGC ATCATTAATC 


CGAGCGCTTT 


CTACATCTAC 


2820 


CTGACCGCTT 


CGGTCAGCAA CGACCCTCTA CCTTACGCTG 


CCTCCGAGGC 


CAACATCOGG 


2880 
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°=t«cc«c cccacoc, mT0Km ioeauau!Ac ^ 

CW!Mra " "OTTCCCT TCTACCTCAA COCCCTACA 

cccc™, Ac™,™ agccatacaa aaa OI<5 a 0 a 0 
»=ct=««c Toraaoct, ccca,,^ raccccItcc 

"ccrooaoc actccccct gctatocatc „c«««c tggcctscac otttctactc 

TCCCC *" CT ICCTOT °" TOTCATSOT CCTOCCTCTC 

»»oo™ agctctttgq aKmmK c*»,tc«a 

0»«TC»«0 KATKCAXC 7QTTGQCATC «AC«c»C, TO.ncOT.C, CM^cm* 
QCCTTTCTCA CACCCATT,* OC.C^c ««» TQCKKm 

TTCreCACC SCTCT0CTM 
TCCGAATTTG ATTTCATTQT CAOAfACTTC TTT^TCC TGOCCATTCT CACCO***, 
«=«TCTC» ATOGACTGQT TCKWCCB ^CC^AT ^ 

««iorcTc caoccaa™ cctaaacco. cocccactc otoocc^ sccccctcc. 

«TOTC=TCC COmocOC, CCCTOCTC^ OACACCAAC. ATCGGTCTGA TTCCTCCMC 
«~m. CCCC^^ «CA Ta((!IC 

"acAcc** cccccccc caagtcattc tocaacccac aoaaaaccct 

"CT^CCC COTCCACW 

raCAiCt0C .ACCCCAACO CCAGCAGCCT 

«»A«»« CCCCT ^ 

" MUB ° B1 "" «""»■« O^CTC ACCCCCCCCT 

-«CCO=TT CTCACAACCC rcOCAACCCA »««CAC„ CCA^CAO CTCXC^ 
AGCTACTCCC AOCCCTCO CACT^ACO CCTTCTCCTT COMCACKT TCCTOroc, 
CCCCCCCCTS CACC^OCCC CAACCCC«A OCOC^CCCT CrcCAO^A TGAOAOCTAC 
C=™=AC„ a™^ AX™^ CCC^^c CITTTCAMt CAOO^AO 
—««» CA^^A 0CTCAT*A0 CTACA^AC, TCCAATCWA CCASACCCCC 
TCMM * 0CS """^ — — » ""TCTCAAO CAAA^C AAACATTCGA 
«"«-« cCCACCCtT TOC^AACO CTTGAACAGA ACTCCTTCCA ATTATOGCAA 
OCACTTCAT TCJTACIGTA A<™ TOM miTOoT1! „„„„„ 

MMCtTOa C,<3,TO " 1 »»»™» ««««T»=A CTCTATTTCC TOCCCCCK* 



2940 
3000 
3060 
3120 
3180 
3240 
3300 
3360 
3420 
3460 
3540 
3600 
3660 
3720 
3760 
3840 
3900 
3960 
4020 
4080 
4140 
4200 
4260 
4320 
4380 
4440 
4500 
4560 
4620 
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CCACTCCTCC CCCAGAGTGG CCAGACCACA COGCCCCTTT CCCCTCTGTA CATTGCTCTC 4680 

TGTCCCACAA CCAAGCTTAA CTTACTTTTA AAAAAAATCT CCCACCATAT CTCCCTCCTC 4740 

CTTAAATATT GTATAATTTA CTTGTATAAT TCTATGCAAA TATTGCTTAT GTAATAGGAT 4800 

TATTTGTAAA GGTTTCTGTT TAAAATATTT TAAATTTGCA TATCACAACC CTCTGGTAGG 4860 

ATGAATTGTT ACTGTTAACT TTTGAACACC CTATCCOTGG TAATTGTTTA ACGAGCAGAC 4920 

ATGAAGAAAA CAGGTTAATC CCAGTGCCTT CTCTAGGGGT AGTTGTATAT GGTTCGCATG 4980 

GGTGCATCTG TGTGTGCATG TGACTTTCCA ATCTACTGTA TTGTGGTTTG TTGTTGTTGT 5040 

TGCTGTTGTT GTTCATTTTG CTGTTTTTGC TTGCTTTGTA TGATCTTAGC TCTGGCCTAG 5100 

GTGGGCTCGG AAGGTCCAGC TCTTTTTCTG TCCTCATGCT GGTGGAAAGG TGACCCCAAT 5160 

CATCTGTCCT ATTCTCTGGG ACTATTC 5187 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1434 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESSs single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:10: 

Met Ala Ser Ala Gly Aon Ala Ala Gly Ala Leu Gly Arg Gin Ala Gly 
15 10 15 

Gly Gly Arg Arg Arg Arg Thr Gly Gly Pro Hie Arg Ala Ala Pro Asp 
20 25 30 

Arg Asp Tyr Leu His Arg Pro Ser Tyr Cys Asp Ala Ala Phe Ala Leu 
35 40 45 

Glu Gin lie Ser Lye Gly Lys Ala Thr Gly Arg Lys Ala Pro Leu Trp 
50 55 60 

Leu Arg Ala Lye Phe Gin Arg Leu Leu Phe Lye Leu Gly Cya Tyr He 
65 70 75 80 

Gin Lys Asn Cys Gly Lye Phe Leu Val Val Gly Leu Leu He Phe Gly 
85 90 95 

Ala Phe Ala Val Gly Leu Lys Ala Ala Asn Leu Glu Thr Asn Val Glu 
100 105 HO 

Glu Leu Trp Val Glu Val Gly Gly Arg Val Ser Arg Glu Leu Asn Tyr 
115 120 125 
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Thr Arg cin Ly. II. Cly „. Clu Ala *.t Ph. A .„ Pro cln ^ ^ 

AJ * 140 
lie Cln Thr Pro Lye Glu Clu Civ Al. v-i » 

145 150 y W * A<m Y** Leu Thr Thr Clu Ala 



155 



160 



Leu Leu Cln Hie Leu Aap s.r Al* Leu Cln Ala Ser Arg Val Hi. v.l 
" 5 170 175 



Tyr M.t Tyr A.n Arg cln Trp Ly . l.u oi„ Hi. Leu Cy. Tyr Ly. ser 

185 190 

Gly Glu L.u lie Thr Clu Thr Cly Tyr Met Aep cln 11. „. Clu ^ 
195 200 20s * 

L.u Tyr Pro cya Leu 11. ii. Thr Pro ^ A „ p ^ phe ^ ^ ^ 

Alj Ly. Leu Cln S.r Cly Thr Al. Tyr Leu L . u Gly Ly . Pro pro ^ 
230 2 « 240 

Arg Trp Thr Aan Phe Aap Pro Leu Clu Phe Leu Clu Clu Leu Ly. Lys 



250 



255 



He Aan Tyr Cln Val Aap Ser Trp clu clu Met Le„ Aan Lya Al. clu 



265 



270 



val Cly His Cly Tyr Met Aap Arg Pro Cya Leu Aan Pro Ala A.p Pro 

285 

A.p cya Pro Al. Thr Al. Pro Aan Ly. A.„ s. r Thr Lya Pro Leu Aap 

295 300 
Val Ala Leu V.l Leu Aan cly Cly Cya cln Cly Leu Ser Arg Lya Tyr 



315 



320 



Met Hi. Trp Cln Clu Clu Leu 11. v.l Cly cly Thr V.l Ly. A.n Ala 

330 335 
Thr Cly Ly. Leu v.l ser Al. Hi. Al. Leu Cln Thr Met Phe cm Leu 



345 



350 



Met Thr Pro Lys Gin Met Tyr Clu Hie php ^ m. 

355 lio ^ r A " P *** Val 



365 



ser Hi. 11. Aan Trp A-n Clu A.p Arg Al. Ala Al. lie Leu Clu Ala 

375 380 
Trp Cln Arg Thr Tyr v.l clu Val V.l Hi. cln Ser Val Ala Pro Aan 

395 400 
S« Thr Cln Ly. Val Leu Pro Phe Thr Thr Thr Thr Leu Aap Aap He 

410 415 

Leu Lye s.r Phe Ser Aep Val Ser V.l He Arg Val Ala s. r cly Tyr 

425 430 
Leu Leu Met Leu Al. Tyr Al. Cy. Leu Thr Met Leu Arg Trp Aap Cya 
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435 



440 



445 



Ser Lye ser Gin Gly Ala Val Gly Leu Ala Gly Val Leu Leu Val Ala 
450 455 460 

Leu Ser Val Ala Ala Gly Leu Gly Leu Cys Ser Leu lie Gly lie Ser 
465 470 475 480 

Phe Aen Ala Ala Thr Thr Gin Val Leu Pro Phe Leu Ala Leu Gly Val 
485 490 495 

Gly Val Asp Asp Val Phe Leu Leu Ala His Ala Phe Ser Glu Thr Gly 
500 505 510 

Gin Aen Lye Arg lie Pro Phe Glu Asp Arg Thr Gly Glu Cys Leu Lye 
515 520 525 

Arg Thr Gly Ala Ser Val Ala Leu Thr Ser He Ser Aen Val Thr Ala 
530 535 540 

Phe Phe Met Ala Ala Leu He Pro He Pro Ala Leu Arg Ala Phe Ser 
545 550 555 560 

Leu Gin Ala Ala Val Val Val Val Phe Asn Phe Ala Met Val Leu Leu 
565 570 575 

He Phe Pro Ala He Leu Ser Met Asp Leu Tyr Arg Arg Glu Asp Arg 
580 585 590 

Arg Leu Asp He Phe Cys Cys Phe Thr Ser Pro Cys Val Ser Arg Val 
595 600 605 

He Gin Val Glu Pro Gin Ala Tyr Thr Glu Pro His Ser Asn Thr Arg 
610 615 620 

Tyr Ser Pro Pro Pro Pro Tyr Thr Ser His Ser Phe Ala His Glu Thr 
625 630 635 640 

His He Thr Met Gin Ser Thr Val Gin Leu Arg Thr Glu Tyr Asp Pro 
645 650 655 

His Thr Hie Val Tyr Tyr Thr Thr Ala Glu Pro Arg Ser Glu He Ser 
660 665 670 

Val Gin Pro Val Thr Val Thr Gin Asp Asn Leu ser Cys Gin Ser Pro 
675 680 685 

Glu Ser Thr Ser Ser Thr Arg Asp Leu Leu Ser Gin Phe Ser Asp Ser 
690 695 700 

Ser Leu His Cys Leu Glu Pro Pro Cys Thr Lys Trp Thr Leu Ser Ser 
705 710 715 720 

Phe Ala Glu Lys His Tyr Ala Pro Phe Leu Leu Lys Pro Lys Ala Lys 



Val Val Val He Leu Leu Phe Leu Gly Leu Leu Gly Val Ser Leu Tyr 



725 



730 



735 



740 



745 



750 
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«, ». - ~* v., „ „ ,. p „. vji 

U 765 

Arg Glu Thr Aro Glu Tvr A* n t»w~ <r, 

7?o 9 Tyr A.p Phe Ile Ala W| ^ ^ ^ ^ ^ 

780 

Ser Phe Tyr Asn Met Tyr He v*i *w- , 

785 £ Q Val «n Lye Ala Asp Tyr Pro Asn 

95 800 

a. «. -u u. J. ^ ». p ^ ph> Mr ^ ^ 

10 815 
* V tl H. t £ „ u „. ^ Ly . _ ^ Mn ^ ^ 

825 830 

*' & A " *" - IZ «« «• -P «• p.. ».p s„ 

w 845 
*, «. »r «, „, Xl . ^ A .„ ... ^ ^ ^ ^ 

° 860 

£ «, v., M . ly . _ ^ vM ^ ^ w 

875 880 
n. »-P U. «„ „„ ^ rar ^ Lay , 41 A . p M . 

890 895 



" P S *" - £ ^ - «r «. Trp 

5 910 

v« s« j. ,. P ^ m M . A1 . M> ^ Hn au ^ ^ 

* U 925 



Pro His Arg Pro Glu Tn> Vai 

930 TrP JJJ Hi8 A8 P *y» Ala Asp Tyr Het Pro Glu 

3 940 

-j *. - J. Al . ai« „„ „ clo A1 . olo , h> 

955 960 



P "* ** - £ - - -J ~ -P - V.X „„ M . 

0 975 

«. «. L y . « ^ m n . e,. A .,, ^ ^ ^ ^ ^ 

w 990 

Ser Ser Tyr Pro Asn oiy Tyr p— ptw , - BV 

995 y Tyr Pro Ph. Leu Phe Trp Qlu Mb ^ ^ 



1000 1005 



ser ^ Hi8 Trp Leu ser ne ^ ^ ^ 

AU15 1020 r 

S.*" V ' 1 * £„ V " "» - - S *" - »• «r 

35 1040 

"* ~ Si. 1 - J--. - P*. My «. t 

1050 1055 
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1060 



1065 



1070 



lis Ala S r Val Gly lie Cly Val Glu Phe Thr Val His Val Ala Leu 
1075 1080 1085 

Ala Phe Leu Thr Ala lie Gly Asp Lys Asn His Arg Ala Met Leu Ala 
1090 1095 1100 

Leu Glu Hie Met Phe Ala Pro Val Leu Asp Gly Ala Val Ser Thr Leu 
1105 1110 1115 1121 

Leu Gly Val Leu Met Leu Ala Gly Ser Glu Phe Asp Phe lie Val Arg 
1125 1130 1135 

Tyr Phe Phe Ala Val Leu Ala He Leu Thr Val Leu Gly Val Leu Asn 
1140 1145 1150 

Gly Leu Val Leu Leu Pro Val Leu Leu Ser Phe Phe Gly Pro Cys Pro 
1155 H60 1165 

Glu Val Ser Pro Ala Asn Gly Leu Asn Arg Leu Pro Thr Pro Ser Pro 
1170 1175 1180 

Glu Pro Pro Pro Ser Val Val Arg Phe Ala Val Pro Pro Gly His Thr 
1185 1190 1195 120C 

Asn Asn Gly Ser Asp Ser Ser Asp Ser Glu Tyr Ser Ser Gin Thr Thr 
1205 1210 1215 

Val Ser Gly He Ser Glu Glu Leu Arg Gin Tyr Glu Ala Gin Gin Gly 
1220 1225 1230 

Ala Gly Gly Pro Ala His Gin Val He Val Glu Ala Thr Glu Asn Pro 
1235 1240 1245 

Val Phe Ala Arg Ser Thr Val Val His Pro Asp Ser Arg His Gin Pro 
1250 1255 1260 

Pro Leu Thr Pro Arg Gin Gin Pro His Leu Asp Ser Gly Ser Leu Ser 
1265 1270 1275 128C 

Pro Gly Arg Gin Gly Gin Gin Pro Arg Arg Asp Pro Pro Arg Glu Gly 
1285 1290 1295 

Leu Arg Pro Pro Pro Tyr Arg Pro Arg Arg Asp Ala Phe Glu He Ser 
1300 1305 1310 

Thr Glu Gly His Ser Gly Pro Ser Asn Arg Asp Arg Ser Gly Pro Arg 
1315 1320 1325 

Gly Ala Arg Ser His Asn Pro Arg Asn Pro Thr Ser Thr Ala Met Gly 
1330 1335 1340 

ser Ser Val Pro Ser Tyr Cys Gin Pro He Thr Thr Val Thr Ala Ser 
1345 1350 1355 1360 

Ala Ser Val Thr Val Ala Val His Pro Pro Pr Gly Pro Gly Arg Asn 



1365 



1370 



1375 
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Pro Arg Gly Oly Pro Cy» Pro Gly Ty r clu - „ _ „ 

1380 7 JjL *** Pr ° Clu Thr *»P 

1385 1390 



^ Sis"" ° lU "» " «J~ - ~ «« ~ »r, c. 01u 

A * WO 1405 

Arg Arg Asp Ser Lye Val clu Val n- G i u T _ 

1410 1415 LeU Cln As P v *l Clu Cys 

1420 

Clu^ciu Arg Pro Trp Oly Ser Ser Ser Aso 



1430 

(2) INFORMATION FOR SBQ ID NO:lli 

(i) SEQUENCE CHARACTERISTICS I 

(A) LENGTH i 11 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESSi single 

(D) Topology: linear 

(ii) MOLECULE TYPE: peptide 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:ll: 
lie lie Thr Pro Leu Aep Cys p ho Trp Glu w 
5 10 . 

(2) INFORMATION FOR SEQ ID NO:12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESSi single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID 

I*u He val Gly Oly 
1 5 

<2) INFORMATION FOR SEQ ID NOU3: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 7 amino acid a 

(B) TYPE: amino acid 

(C) STRANDEDNESSi single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
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28 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Pro Phe Phe Trp Glu Gin Tyr 

' 1 5 

(2) I N FORMATION FOR SEQ ID NO: 14 1 

(i) SEQUENCE CHARACTERISTICS x 

(A) LENGTHS 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS t single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /deac - "primer" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
GGACGAATTC AARGTNCAYC ARYTNTGG 
(2) INFORMATION FOR SEQ ID NO: 15* 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc - •primer" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:15: 
GGACGAATTC CYTCCCARAA RCANTC 26 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) DESCRIPTION: /desc » -primer" 



(Xi) SEQUENCE DESCRIPTION * SEQ ID N 1 16: 

52 



WO 96/11260 



PC77US95/13233 



CGACGAATTC YTNGANTCYT TYTGGCA 

27 

(2) INFORHATI N FOR SEQ ID N :17, 

(i) SEQUENCE CHARACTERISTICS t 

(A) LENGTHS 31 base pairs 

(B) type: nucleic acid 

(C) STRAND ED NESS s single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 
(A) descriptions /deac - "primer" 



(xi) SEQUENCE DESCRIPTION t SEQ ID NO:17: 
CATACCAGCC AACCTTCTCN GGCCARTGCA T 

31 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5288 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNBSS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: 
GAATTCCGGG GACCGCAAGG AGTGCOGCGG AAGCGCCCGA AGGACAGCCT CCCTCGGOGC 
CCCGGCTCTC GCTCTTCCGC CAACTGGATC TCCGCAGCGG CGGCCGCAGA GACCTCGGCA 
CCCCCGCGCA ATGTGGCAAT GGAAGGCGCA GGGTCTGACT CCCCGGCAGC GGCCGCGGCC 
GCAGCGGCAG CAGCGCCCGC CGTGICAGCA GCAGCAGCGG CTGGTCTGTC AACCGGAGCC 
OGAGCCCGAG CAGCCTGCGG CCAGCAGCGT CCTCCCAAGC CGAGCGCCCA GGCGCGCCAG 
GAGCCCGCAG CAGOGGCAGC AGCGCGCCGG GCCGCCCCGG AAGCCTCCGT CCCCGCGGCG 
GCGGCGCOGG CGCCGGOGGC AAGATGGCCT CGGCTGGTAA CGCCGCCGAG CCCCAGGACC 
GCGGCGGCGG CGGCAGCGGC TGTATCGGTO CCCCGGGAOG GCCGCCTGGA GGCGGGAGGC 
GCAGACGGAC GGGGGGGCTG CGCCGTGCTG CCGCGCCGGA CCGGGACTAT CTGCACCGGC 
CCAGCTACTG CGACCCCGCC TTOGCTCTCC AGCAGATTTC CAAGGCGAAG GCTACTGGCC 
GGAAAGCGCC ACTGTGGCTG AGAGCGAAGT TTCAGAGACT CTTATTTAAA CTGGGTTCTT 
ACATTCAAAA AAACTCCCGC AAGTTCTTCG TTGTGCCCCT CCTCATATTT GGGGCCTTOG 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
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CGCTGGGATT AAAAGCAGCG AACCTCGAGA CGAACGTGGA GGAGCTGTGG GTGGAAGTTG 780 

GAGGAOGAGT AAGTCGTGAA TTAAATTATA CTCGCCAGAA GATTCGAGAA CAGCCTATGT 840 

TTAATCCTCA ACTCATGATA CAGACCCCTA AAGAAGAAGG TGCTAATGTC CTGACCACAC 900 

AAGCGCTCCT ACAACACCTG GACTCGGCAC TCCAGGCCAG CCGTGTCCAT GTATACATGT 960 

ACAACAGGCA GTGGAAATTG GAACATTTGT GTTACAAATC AGGAGAGCTT ATCACAGAAA 1020 

CAGGTTACAT GGATCAGATA ATAGAATATC TTTACCCTTG TTTGATTATT ACACCTTTGG 1080 

ACTGCTTCTG GGAAGGGGCG AAATTACAGT CTGGGACAGC ATACCTCCTA GGTAAACCTC 1140 

CTTTGCGGTG GACAAACTTC GACCCTTTGG AATTCCTGGA AGAGTTAAAG AAAATAAACT 1200 

ATCAAGTGGA CAGCTGGGAG GAAATGCTGA ATAAGGCTGA GGTTGGTCAT GGTTACATGG 1260 

ACCGCCCCTG CCTCAATCCG GCCGATCCAG ACTGCCCCGC CACAGCCCCC AACAAAAATT 1320 

CAACCAAACC TCTTGATATG GCCCTTGTTT TGAATGGTGG ATGTCATGCC TTATCCAGAA 1380 

AGTATATGCA CTGGCAGGAG GAGTTGATTG TGGGTGGCAC ACTCAAGAAC AGCACTGGAA 1440 

AACTCGTCAG CGCCCATGCC CTGCAGACCA TGTTCCAGTT AATGACTCCC AAGCAAATGT 1500 

ACGAGCACTT CAAGGGGTAC GAGTATGTCT CACACATCAA CTGGAACGAG GACAAAGCGG 1560 

CAGCCATCCT GGAGCCCTGG CAGAGGACAT ATGTGGAGGT GGTTCATCAG AGTGTCGCAC 1620 

AGAACTCCAC TCAAAAGGTG CTTTCCTTCA CCACCACGAC CCTGGACGAC ATCCTGAAAT 1680 

CCTTCTCTGA CGTCAGTGTC ATCCGCGTGG CCAGCGGCTA CTTACTCATG CTCGCCTATG 1740 

CCTGTCTAAC CATCCTGCGC TGGGACTGCT CCAAGTCCCA GGGTGCCGTG GGGCTGGCTG 1800 

GCGTCCTGCT GGTTGCACTG TCAGTGGCTG CAGGACTGGG CCTGTGCTCA TTCATCGGAA 1860 

TTTCCTTTAA CGCTGCAACA ACTCAGGTTT TCCCATTTCT CGCTCTTGGT GTTGCTGTGG 1920 

ATGATGTTTT TCTTCTGGCC CACGCCTTCA GTGAAACAGG ACAGAATAAA AGAATCCCTT 1980 

TTGAGGACAG GACCGGGGAG TGCCTGAAGC GCACAGGAGC CAGCGTGGCC CTCACGTCCA 2040 

TCAGCAATGT CACAGCCTTC TTCATGCCCG CGTTAATCCC AATTCCCGCT CTGCGGGCGT 2100 

TCTCCCTCCA GGCAGCGGTA GTAGTGGTGT TCAATTTTGC CATGGTTCTG CTCATTTTTC 2160 

CTGCAATTCT CAGCATGGAT TTATATCCAC GCGAGGACAG GAGACTGGAT ATTTTCTGCT 2220 

GTTTTACAAG CCCCTGCGTC AGCAGAGTGA TTCAGGTTGA ACCTCAGGCC TACACCGAGA 2280 

CACACGACAA TACCCGCTAC AGCCCCCCAC CTCCC7ACAG CAGCCACAGC TTTGCCCATG 2340 

AAACGCAGAT TACCATGCAG TCCACTGTCC AGCTCCGCAC GCAGTACGAC CCCCACACGC 2400 

ACGTGTACTA CACCACCGCT GAGCOCCGCT CCGAGATCTC TGTCCAGCCC GTCACOGTCA 2460 
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CACAGGACAC CCTCAGCTGC CAGAGCCCAG AGAGCACCA CTCCACAAGG GACCTGCTCT 
CCCAGTTCTC CCACTCCACC CTCCACTGCC TOGACCCCCC CTGTAOGAAG TGGACACTCT 
CATCTTTTGC TGAGAAGCAC TATGCTCCTT TCCTCTTGAA ACCAAAACCC AAGGTAGTGG 
TGATCTTCCT TTTTCTCGGC TTCCTCGGGG TCAGCCTTTA TGGCACCACC CGAGTGAGAG 
ACGGGCTGGA CCTTACOGAC ATTGTACCTC GGGAAACCAG AGAATATGAC TTTATTCCTG 
CACAATTCAA ATACTTTTCT TTCTACAACA TGTATATAGT CACCCAGAAA CCAGACTACC 
CCAATATCCA GCACTTACTT TACCACCTAC ACAGGAGTTT CAGTAACCTC AAGTATGTCA 
TGTTGGAAGA AAACAAACAO CTTCCCAAAA TGTGGCTGCA CTACTTCAGA CACTGCCTTC 
AGGGACTTCA GGATGCATTT GACAOTGACT CGCAAACCCG GAAAATCATG CCAAACAATT 
ACAAGAATGG ATCACACGAT GGAGTCCTTG CCTACAAACT CCTGGTGCAA ACCGGCAGCC 
GCGATAAGCC CATCGACATC ACCCACTTGA CTAAACAGCG TCTGCTGGAT GCAGATOGCA 
TCATTAATOC CAGOCCTTTC TACATCTACC TOAOCGCTTG GGTCAGCAAC GACCCCGTCG 
CGTATGCTCC CTCCCAGGCC AACATCCOCC CACACCGACC AGAATGGGTC CACGACAAAG 
COGACTACAT GCCTCAAACA AGGCTGAGAA TCCOGGCAGC ACAGCCCATC GAGTATCCCC 
ACTTCCCTTT CTACCTCAAC GGCTTGOGGG ACACCTCAGA CTTTGTCGAG CCAATTGAAA 
AACTAAGGAC CATCTCCACC AACTATACGA GCCTGGGGCT GTCCACTTAC CCCAAOGGCT 
ACCCCTTCCT CTTCTGGGAG CAGTACATCC GCCTCCGCCA CTGGCTCCTG CTCTTCATCA 
GCGTCGTGTT GGCCTCCACA TTCCTCGTGT GOGCTGTCTT CCTTCTGAAC CCCTGGACGG 
CCGGGATCAT TGTGATGGTC CTGGOGCTGA TGAOGGTCGA GCTCTTCCGC ATGATGGGCC 
TCATOGGAAT CAAGCTCAGT GCCGTGCCOG TOOTCATCCT GATCCCTTCT CTTGGCATAG 
GAGTGGACTT CACCCTTCAC GTTGCTTTGG CCTTTCTGAC GGCCATOGGC CACAAGAACC 
GGAGGGCTGT GCTTGCCCTG GAGCACATGT TTGCACCCGT CCTGCATGGC GCCGTGTCCA 
CTCTGCTGGG AGTCCTGATG CTGGCGGGAT CTGACTTCGA CTTCATTGTC AGGTATTTCT 
TTGCTGTGCT GCCCATCCK ACCATCCTCG COCTTCTCAA TGGGCTGCTT TTGCTTCCCG 
TCCTTTTGTC TTTCTTTGGA CCATATCCTC AGGTCTCTCC AGCCAACGGC TTGAACCGCC 
TCCCCACACC CTCCCCTGAG CCACCCCCCA COGTOGTCCG CTTCCCCATG CCGCCCGCCC 
AGACGCACAG CGGGTCTGAT TCCTCCGACT CGGAGTATAG TTOCCACACG ACACTGTCAG 
GCCTGAGCGA GGAGCTTOGG CACTACGAOG CCCAGCAGCG CGCCGGACGC CCTGCCCACC 
AAGTCATCGT GGAAGCCACA GAAAACCCOG TCTTCGCCCA CTCCACTCTG GTCCATCCCG 



2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3120 
3180 
3240 
3300 
3360 
3420 
3460 
3540 
3600 
3660 
3720 
3780 
3840 
3900 
3960 
4020 
4080 
4140 
4200 
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AATCCAGGCA 


TCACCCACCC 


TOGAACCCGA 


GACAGCAGCC 


CCACCTGGAC 


TCAGGGTCCC 


4260 


TGCCTCCCCG 


ACGGCAAGGC 


CAGCAGCCCC 


GCAGGGACCC 


CCCCAGAGAA 


GGCTTGTGGC 


4320 


CACCCCTCTA 


CAGACOGCGC 


AGAGACGCTT 


TTGAAATTTC 


TACTGAAGGG 


CATTCTGGCC 


4380 


CTAGCAATAG 


GGCCOGCTGG 


GGCCCTCGCG 


GGGCCCGTTC 


TCACAACCCT 


CGGAACCCAG 


4440 


CGTCCACTGC 


CATGGGCACC 


TCCGTGCCCG 


GCXACTGCCA 


GCCCATCACC 


ACTGTGAOGG 


4500 


CTTCTCCCTC 


CGTQACTGTC 


GCCGTGCACC 


CGCCGCCTGT 


CCCTGGGCCT 


GGGCGGAACC 


4560 


CCCGAGGGGG 


ACTCTGCCCA 


GGCTACCCTG 


AGACTGACCA 


CGGCCTGTTT 


GAGGACCCCC 


4620 


ACGTGCCTTT 


CCACGTCCGG 


TGTGAGAGGA 


GGGATTCGAA 


GGTGGAAGTC 


ATTGAGCTGC 


4680 


AGGACGTGGA 


ATGCGAGGAG 


AGGCCCOGGG 


CAAGCAGCTC 


CAACTGAGGG 


TGATTAAAAT 


4740 


CTGAAGCAAA 


GAGGCCAAAG 


ATTGGAAACC 


CCCCACCCCC 


ACCTCTTTCC 


AGAACTGCTT 


4800 


GAAGAGAACT 


GGTTGGAGTT 


ATGGAAAAGA 


TGCCCTCTCC 


CAGGACAGCA 


GTTCATTGTT 


4860 


ACTGTAACCG 


ATTGTATTAT 


TTTGTTAAAT 


ATTTCTATAA 


ATATTTAAGA 


GATGTACACA 


4920 


TGTCTAATAT 


AGGAAGGAAG 


GATGTAAAGT 


GGTATGATCT 


GGGGCTTCTC 


CACTCCTGCC 


4980 


CCAGAGTGTG 


GAGGCCACAG 


TGGGGCCTCT 


CCGTATTTGT 


CCATTGGGCT 


CCGTGCCACA 


5040 


ACCAAGCTTC 


ATTAGTCTTA 


AATTTCAGCA 


TATGTTGCTG 


CTGCTTAAAX 


ATTGTATAAT 


5100 


TTACTTGTAT 


AATTCTATGC 


AAATATTGCT 


TATGTAATAG 


GATTATTTTG 


TAAAGGTTTC 


5160 


TGTTTAAAAT 


ATTTTAAATT 


TGCATATCAC 


AACCCTGTGG 


TAGTATGAAA 


TGTTACTGTT 


5220 


AACTTTCAAA 


CACGCTATGC 


GTGATAATTT 


TTTTGTTTAA 


TGAGCAGATA 


TGAAGAAAGC 


5280 



CCGGAATT 

(2) INFORMATION FOR SEQ ID NO* 19* 

(1) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1447 amino acids 

(B) TYPE i amino acid 

(C) STRANDBDNBSS t single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE l protein 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

Met Ala Ser Ala Cly Aen Ala Ala Clu Pro Gin Asp Arg Gly Gly Gly 
15 10 15 

Glv Ser Gly Cye He Oly Ala Pro Gly Arg Pro Ala Oly Gly Gly Arg 
20 25 30 
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Arg Arg Arg Thr Cly Cly Leu Arg Aro Ala ai» ai. b » 

35 ~* 9 Ala A1 * A 1 * Pro Asp Arg Asp 

40 45 

Tyr Leu Hi. Arg Pro Ser Tyr Cy. A.p A l. Ala phe ^ ^ ^ ^ 



60 

II. S.r Ly. Cly Ly. Al. Thr cly ^ ty , ^ ^ ^ ^ ^ ^ 

75 80 
Ala Ly. Ph. 01n Arg Leu Leu Ph. Ly. Leu Cly cya Tyr lie Cln Ly. 

90 95 

Aan Cya Cly Ly. Ph. Leu V.l Val cly Leu Leu II. Ph . «y Al. Phe 

105 110 

Al. V.1 Cly Leu Ly. Al. Ala Aan Leu Clu thr Aan Val clu Clu L.„ 

120 125 

Trp Val Glu Val Cly cly Aro V*l * ^, . 

130 ?ff Val Ser Glu Asn Tyr Thr Arg 

135 140 

Cln Ly. xi. cly Clu Clu Al. Met Phe A.n Pro cln Leu Met Xle Cl„ 
" 0 155 160 

Thr Pro Ly. clu clu Cly Al. Aa„ v.l Leu Thr Thr Clu Al. Leu Leu 

170 1?5 

oin His Leu Aap ser Al. Leu Cln Al. Ser Arg v.l „ ia v.i Tyr Met 

18S 190 
Tyr A.n Arg cln Trp Ly. Leu Clu Hi. Leu Cy. Tyr Ly. Ser cly ci„ 

20< ? 205 

«« Xle Thr Clu Thr Cly Tyr Met Aap cln XI. xle Clu Tyr Leu Tyr 

215 220 
Pro cy. Leu Xle Xle Thr Pro Leu Aap Cy. Phe Trp clu cly Al. Ly. 

235 240 
Leu Cln Ser Cly Thr Al. Tyr Leu Leu cly Ly. Pro Pro Leu Arg Trp 

250 2S5 
Tnr A.n Ph. jjp Pro Leu Clu Phe Leu Clu Clu Leu Ly. Ly. XI. Aan 

265 270 
Tyr Cln v.l A.p ser Trp clu Clu Met Leu Aen Ly. Al. ci u v.l cly 

280 265 

His Cly Tyr Met Asp Arg Pro Cys Leu Amh Pr« m. , _ 

290 * 0 ll T MU A8n Pro A1 * A *P **o Asp cys 

295 300 

Pro Al. Thr Al. Pro Aan Ly. Aan Ser Thr Ly. Pro Leu Aap Met Al. 

315 320 
L.u V.1 Leu Aan Cly Cly Cy. Hi. Cly Leu Ser Arg Ly. Tyr Met Hi. 

330 33S 
Trp Cln Clu Clu Leu Xle v.l cly cly Thr Val Ly. Aan Ser Thr Cly 
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340 



345 



350 



Lye Leu Val Ser Ala Hie Ala Leu Gin Thr Met Phe Gin Leu Met Thr 
355 360 365 

• Pro Lys Gin Met Tyr Glu Hie Phe Lys Gly Tyr Glu Tyr Val Ser Hie 
370 375 380 

lie Aon Trp Aen Glu Asp Lye Ala Ala Ala He Leu Glu Ala Trp Gin 
365 390 395 400 

Ara Thr Tyr Val Glu Val Val His Gin Ser Val Ala Gin Asn Ser Thr 
405 410 415 

Gin Lys Val Leu ser Phe Thr Thr Thr Thr Leu Asp Aep He Leu Lye 
420 425 430 



Ser Phe Ser Aep Val Ser Val He Arg Val Ala Ser Gly Tyr Leu Leu 
435 440 445 

Met Leu Ala Tyr Ala Cye Leu Thr Met Leu Arg Trp Asp Cys Ser Lys 
450 455 460 

Ser Gin Gly Ala Val Gly Leu Ala Gly Val Leu Leu Val Ala Leu Ser 
465 470 475 480 

val Ala Ala Gly Leu Gly Leu Cys Ser Leu He Gly He ser Phe Aen 
4S5 490 495 

Ala Ala Thr Thr Gin Val Leu Pro Phe Leu Ala Leu Gly Val Gly Val 
500 505 510 

Asn Asp Val Phe Leu Leu Ala His Ala Phe Ser Glu Thr Gly Gin Asn 
515 520 525 

Lys Arg He Pro Phe Glu Asp Arg Thr Gly Glu Cys Leu Lys Arg Thr 
530 535 540 

Gly Ala Ser Val Ala Leu Thr Ser He Ser Asn Val Thr Ala Phe Phe 
545 550 555 560 

Met Ala Ala Leu He Pro He Pro Ala Leu Arg Ala Phe Ser Leu Gin 
565 570 575 

Ala Ala Val Val Val Val Phe Asn Phe Ala Met Val Leu Leu He Phe 
580 585 590 

Pro Ala He Leu Ser Met Asp Leu Tyr Arg Arg Glu Asp Arg Arg Leu 
595 600 60S 

Asp He Phe Cys Cys Phe Thr Ser Pro Cys Val Ser Arg Val He Gin 
610 615 620 

Val Glu Pro Gin Ala Tyr Thr Asp Thr His ABp Asn Thr Arg Tyr ser 
625 630 635 640 

Pro Pro Pro Pro Tyr Ser Ser His Ser Phe Ala His Glu Thr Gin He 
645 650 655 
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Thr Hat 61 „ ser Thr v.l cm Leu Ar, Thr clu ^ ^ Hi8 ^ 

665 670 
His V.l Tyr Tyr Thr Thr Ala Clu Pro Arg s r Clu lie Ser Val Cl„ 

680 68S 

Pro Val Thr V.l Thr Cl„ Asp Thr Leu Ser Cy. oln Ser Pro Clu Ser 

695 700 
Thr ser Ser Thr Arc Asp Leu Leu s.r Cln Phe Ser Asp s.r s.r Leu 



715 



720 



His Cy, Leu Clu Pro Pro Cys Thr Ly* Trp Thr Leu S.r Ser Phe Ma 

735 



725 730 



Clu Ly. Hi. Tyr Ale Pro Ph. L.u Leu Lye Pro Ly. Al. Ly. v.l v.l 



V.l lie Ph. Leu Ph. Leu Cly L.« Leu Cly Val Ser Leu Tyr Gly Thr 
755 760 765 

Thr Arg v.l Arg Asp Cly Leu Asp Leu Thr A.p il. v.l Pro Arg Clu 

775 780 
Thr Arg clu Tyr Asp Phe II. Ala Ala Cln Ph . Ly. Tyr Ph. S.r Phe 

795 800 
Tyr Ab„ Met Tyr II. v.l Thr Cln Ly. Ala A.p Tyr Pro Aen lie cln 

810 815 
His Leu Leu Tyr Asp Leu His Arg Ser Phe Ser Asn val Lys Tyr Val 

825 830 
Met Leu Clu Clu Asn Ly. Cln Leu Pro Ly. Met Trp Leu Hi. Tyr Ph. 



830 

Pro Ly. Met Trp Leu 
840 845 

Arg Asp Trp Leu Cln Cly Leu Cln A.p Ala Phe Asp s.r 



850 * ~; *" ynm *«P Smr Asp Trp Clu 

855 860 



Thr Cly Lys II. Met Pro A.„ Asn Tyr Lys Asn Cly ser Asp A.p cly 

875 880 
V.1 Leu Ala Tyr Ly. Leu Le u v .l cln Thr Cly Ser Arg Asp Lys Pro 

Ue Asp II. ser Oln Leu Thr Ly. Oln Arg Leu val JUp Ala Asp Cly 

905 910 

II. He Jen Pro s.r Al. Ph. Tyr He Tyr L.u Thr Al. Trp Val s.r 

«0 925 

Asn Asp Pro Val Ala Tyr Al. Ala s.r Cln Al. Asn lie Arg Pro Hi. 

935 940 
jrg Pro Clu Trp Val Hi. Asp Ly. Ala Asp Tyr Met Pro Clu Thr Arg 



955 960 



I-u Argil Pro Al. Al. clu Pro II Clu Tyr Al. cln Phe Pro 



Ph 
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965 970 975 

Tyr Leu Asn Gly Leu Arg Asp Thr Ser Asp Phe VaX OXu Ala lie Glu 
980 985 990 

Lye Val Arg Thr lie Cys Ser Asn Tyr Thr Ser Leu Gly Leu Ser Ser 
995 1000 1005 

Tyr Pro Asn Gly Tyr Pro Phe Leu Phe Trp Glu Gin Tyr lie Gly Leu 
1010 1015 1020 

Arg His Trp Leu Leu Leu Phe lie Ser Val Val Leu Ala Cys Thr Pbe 
1025 1030 1035 1040 

Leu Val Cys Ala Val Phe Leu Leu Asn Pro Trp Thr Ala Gly lie lie 
1045 1050 1055 

Val Met Val Leu Ala Leu Met Thr Val Glu Leu Phe Gly Met Met Gly 
1060 1065 1070 

Leu lie Gly He Lys Leu Ser Ala Val Pro Val Val lie Leu He Ala 
1075 1080 1065 

Ser Val Gly He Gly Val Glu Phe Thr Val His Val Ala Leu Ala Phe 
1090 1095 1100 



Leu Thr Ala He Gly Asp Lys Asn 
1105 1110 

His Met Phe Ala Fro Val 
1125 

Val Leu Met Leu Ala Gly Ser Glu 
1140 



Arg Arg Ala Val Leu Ala Leu Glu 
1115 1120 

Ala Val Ser Thr Leu Leu Gly 
1130 1135 

Phe Asp Phe He Val Arg Tyr Phe 
1145 1150 



Leu Asp Gly 



Phe Ala Val Leu Ala He Leu Thr He Leu Gly Val Leu Asn Gly Leu 
1155 1160 1165 

Val Leu Leu Pro Val Leu Leu Ser Phe Phe Cly Pro Tyr Pro Glu Val 
1170 1175 1180 

Ser Pro Ala Asn Gly Leu Asn Arg Leu Pro Thr Pro Ser Pro Glu Pro 
1185 1190 H95 1200 

Pro Pro Ser Val Val Arg Phe Ala Met Pro Pro Gly His Thr His Ser 
1205 1210 1215 

Gly Ser Asp Ser Ser Asp Ser Glu Tyr Ser Ser Gin Thr Thr Val Ser 
1220 1225 1230 

Gly Leu Ser Glu Glu Leu Arg His Tyr Glu Ala Gin Gin Gly Ala Gly 
1235 1240 1245 

Gly Pro Ala His Gin Val He Val Glu Ala Thr Glu Asn Pro Val Phe 
1250 1255 1260 

Ala His Ser Thr Val Val His Pro Glu Ser Arg His His Pro Pro S r 
1265 1270 1275 1280 
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Ann Pro Arg Cln Oln Pro Hi. Leu Asp S«r Oly Ser L«u Pr Pro Oly 
1285 1290 1295 

Arg Oln Gly Oln Cln Pro Arg Arg Asp Pro Pro Arg Olu ly Leu Trp 
1300 130S i3io 

Pro Pro Leu Tyr Arg Pro Arg Arg Aap Ala Phs Olu lie Ser Thr Olu 
"15 1320 1325 

Oly His Ser oly Pro Sar Asn Arg Ala Arg Trp Oly Pro Arg Oly Ala 
1330 1335. l340 

Arg ser His Asn Pro Arg Asn Pro Ala Ser Thr Ala Net Oly Ser Ser 
«45 1350 13 55 * i 360 

Val Pro Oly Tyr Cys Gin Pro He Thr Thr Val Thr Ala Ser Ala Ser 
136S 1370 1375 

Val Thr Val Ala Val Hie Pro Pro Pro Val Pro Gly Pro Gly Arg Asn 
1380 1385 1390 

Pro Arg Gly Gly Leu Cys Pro Gly Tyr Pro Clu Thr Asp His Oly Leu 
1395 1400 1405 

Phe Olu Asp pro His Val Pro Phe His Val Arg Cys Glu Arg Arq Aso 
1*10 1415 1420 

Ser Lys Val Glu Val He Clu Leu Gin Asp Val Olu Cys Glu Glu Ara 
1428 1*30 1435 2440 

Pro Arg Gly Ser Ser Ser Asn 
1445 



61 



W 96/11260 



PCT/US95/13233 



WHAT IS CLAIMED IS: 

1. A DNA sequence other than present in a chromosome encoding a patched gene 
other than the Drosophila patched gene or fragment thereof of at least about 12bp 

5 different from the sequence of the Drosophila patched gene. 

2. A DNA sequence according to Claim 1 , wherein said patched gene is a 
mammalian gene. 

10 3. A DNA sequence according to Claim 1 for human, mouse, mosquito, butterfly 
or beetle patched gene. 

4. A DNA sequence according to Claim 3, wherein said DNA sequence is a 
human sequence. 

15 

5. A DNA sequence according to Claim 4, wherein said DNA sequence is a 
mouse sequence. 

6. A DNA sequence according to Claim 1, wherein said DNA sequence is a 
20 fragment of at least about 18bp. 

7. A DNA sequence according to Claim 1 joined to a DNA sequence comprising 
a restriction enzyme recognition sequence. 

25 8, An expression cassette comprising a transcriptional initiation region functional 
in an expression host, a DNA sequence according to Claim 1 under the 
transcriptional regulation of said transcriptional initiation region, and a 
transcriptional termination region functional in said expression host 

30 9. An expression cassette according to Claim 8, wherein said transcriptional 
initiation region is heterologous to said DNA sequence according to Claim 1. 
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unoation region is homologous u ^ DNA ^ ^ j 

includes the enhancer region. 

5 H. A ^ Mm P I l s ^™*3qil'e$(ion cassette aixoiding to Claim 8 as part of an 

° f ta » i« cell and the cellubr pn*e„ y „ 



>0 1Z ^ A ^'^toCla i mll.n rt e r com^n si n el he^ protetatathe 

ceUular membrane of said cell. 



■3. A ^a^«.oC hiro U.wbe rei „sa a n«* a(proleinisaraou . 
protein. 



15 



14. *>*-»«H»C*mll.* mil , m ^ immlttimn 



protein 



15. 



20 D ^^^ amx ^^^ an ^ 0>c 
and enhancer joined to a heterologous gene. 

16. A <««»^»«pressto 

*Soa functional in » aqmrio. host, said trarccnptionu irtdaoon region 
consisdn, of a 5- non^oding region reg*^ ne 
compmmg the nromote and chancer, . marker ^ „„, , 
•ennmata region, as par, of an ettachromosonal element or integ** „„, fc 
genome of a host cell asa result of ititroduction of said expression cassette into said 
Host, and the cellular progeny thereof. 



25 



30 
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17. A cell according to Claim 16, wherein said transcriptional initiation region is 
the Drosophila region. 

18. A method for following embryonic development employing the patched 
5 protein in an embryo, said method comprising: 

integrating an expression cassette comprising a transcriptional initiation region 
functional in embryonic host cells, said transcriptional initiation region consisting of 
a S 1 non-coding region regulating the transcription of patched protein, a marker 
gene, and a transcriptional termination region, wherein said embryonic host cells are 
10 capable of developing into a fetus; 

growing said embryonic host cells, whereby proliferation and differentiation 
occur; and 

locating cells comprising expression of the patched protein by means of 
expression of said marker gene. 

15 

19. A method for producing patched protein, said method comprising: 
growing a cell according to Claim 11, whereby said patched protein is 

expressed; and 

isolating said patched protein free of other proteins. 

20 

20. A method for screening candidate compounds for binding affinity to the 
patched protein, said method comprising: 

combining said candidate protein with a vertebrate or invertebrate cell 
comprising said patched protein in the membrane of said cell and an expression 

25 cassette comprising a transcriptional initiation region functional in said cell, a DNA 
sequence according to Claim 1 comprising the entire coding sequence under the 
transcriptional regulation of said transcriptional initiation region, and a 
transcriptional termination region functional in said cell, expressing said patched 
protein in said cell; and 

30 assaying for the binding of said candidate compound to said patched protein. 
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21. A method for screening candidate compounds for agonist activity with the 
patched protein, said method comprising: 

combining said candidate protein with a vertebrate r invertebrate cell 
comprising said patched protein in the membrane of said cell and an expression 
5 cassette comprising a transcriptional initiation region functional in an expression 
host, said transcriptional initiation region consisting of a 5' non-coding region 
regulating the transcription of patched protein, a marker gene, and a transcriptional 
termination region, as part of an extrachromosomal element or integrated into the 
genome of a host cell; and 
10 assaying for the expression of said marker gene. 

22. A monoclonal antibody binding specifically to a patched protein, other than 
the Drosophila patched protein. 
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